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TESTING INDEPENDENCE IN HIGH DIMENSIONS WITH SUMS OF 

RANK CORRELATIONS 

DENNIS LEUNG AND MATHIAS DRTON 


Abstract. We treat the problem of testing independence between m continuous variables 
when m can be larger than the available sample size n. We consider three types of test 
statistics that are constructed as sums or sums of squares of pairwise rank correlations. 
In the asymptotic regime where both m and n tend to infinity, a martingale central limit 
theorem is applied to show that the null distributions of these statistics converge to Gaussian 
limits, which are valid with no specific distributional or moment assumptions on the data. 
Using the framework of U-statistics, our result covers a variety of rank correlations including 
Kendall’s tau and a dominating term of Spearman’s rank cor relation coefficient (rho), bu t 
also degenerate U-statistics such as Hoeffding’s D, or the r* of lSergsma and Dassiosl ll20l4l . 
As in the classical theory for U-statistics, the test statistics need to be scaled differently 
when the rank correlations used to construct them are degenerate U-statistics. The power of 
the considered tests is explored in rate-optimality theory under a Gaussian equicorrelation 
alternative as well as in numerical experiments for specific cases of more general alternatives. 


1. Introduction 


This paper is concerned with nonparametric tests of independence between the coordinates 
of a continuous random vector X = ..., Let Xi,..., X„ be an i.i.d. sample, with 

each Xj = following the same distribution as X. We then wish to test the 

null hypothesis 

(1.1) Ho : ..., are independent. 

The natural approach is to form a test statistic that measures the dependence among the 
variables ..., based on the sample, and reject Hq when its value is too large, where 
the critical value of rejection is calibrated by the asymptotic distribution of the test statistic 
under the null. Our focus is on the use of rank correlations in problems where the dimension 
m can be larger than the sample size n. Specifically, our testing procedures will be studied 
under the asymptotic regime where m = m{n) grows as a function of n such that m also tends 
to infinity. This regime is denoted by m, n —> oo throughout our paper. 

There is a vast literature on the problem of testing independence. If X is normal, then under 
the traditional asymptotic setup in which n goes to oo while m is fixed, the likelihood rat io test 
(LRT) statistic converges to a chi-square distribution when Ho is true (|Andersonl . 120031 1. This 
test is known to be unimple mentable for m > n due to the singularity of the sample covariance 
matrix, but recent work of Jiang and Qi ( 20151 . Corollary 1) shows asymptotic normality for 
the LRT statistic under the regime where to, n —> oo while n > to -|- 4. When m can actually 
be larger than n, one line of work uses the maximum of many pairwise dependency measures to 


test for (11.11) . For p = 1,..., to, let = (A 


ip) 


Xn'’) be the sample of observations for 
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the p-th variable. For 1 < p ^ q < m, let denote the sample Pearson (product-moment) 
correlation of and I.Tianei ( 20041 1 proved that, under suitable centering and scaling, 

the null distribution of the statistic 


( 1 . 2 ) 


max (t-Ip?))^ 

l<p<q<m 


converges to an extreme value distribution of type 1 when m/n converges to a constant 7 G 
(0,oo) as TO,n —> 00 . We will abbreviate such convergence as m/n —> 7 € (0 , 00 ). H e 
assumed higher-order moment condition s that were weakened in subsequent work (|Li et al 


2 OIOI . 2012 , Liu et al. . 20081 . Zhou . 2007 1. Cai and Jiang ( 2011 1 derived a similar asymptotic 


distribution for the statistic from (O, allowing for subexponentia l growth in th e dime nsion m. 
Further weakening distributional assumptions, the recent work of lHan and Liul (120141 1 treated 
maxima of rank correlations, that is, the sample Pearson correlation in IE 21 ) is replaced by a 
rank correlation measure such as Kendall’s tan. This maximum was shown to have a similar 
extreme value type null distribution. Statistics such as (lEa are of obvious appeal when strong 
dependence is expected between some variables. 

This paper, however, aligns with a different approach that is appealing when moderate de¬ 
pendence is expected between many variables. In this approach, tests are based on estimates of 
the sum of many pairwise dependency signals. Let E = (cr^^'^^) and R = be, respectively, 

the population covariance a nd Pearson co rrelation matrix of the random vector X. Under a 
Gaussian assumption for X, ISchotti (|2005l l proposed the use of the “plug-in” estimate 


(1.3) 


Sr := 


E 

l<p<q'<m 


(^j.ipg)y 


Subsequent work of Chen and Shaol ( 2012ll 


for the overall dependency signal 
obtained a Berry-Esseen bound for this statistic’s weak convergence to normality under Hq 
as m,n —> 00 . The st atistic Sr is in fact Rao’s score statistic for the multivariate normal 
setting; see Appendix 1X1 Maol ( 2014 1 suggested a related statistic, namely, the sum of /(r^P"?^) 
for f{x) = /(\ — x'^), and again the null distribution is shown to be asymptotically normal. 

For the two related problems of testing the e quality and t he proportionality of E to the identit y 
matrix, similar statistics haye been studied (IJohnl . Il972l . iLedoit and Woli l2002l . iNaead . 119731 1. 
Motiyated by this approach, we construct our first class of test statistics by plugging in rank 
correlations to obtain nonparametric tests for dni). We illustrate it here for Kendall’s tau. For 
I < p ^ q < m, let 


(1.4) 


^(pq) — 


E 

l<2<j<n 


sgn ( 


xi«) - X 


(<?)' 


be the sample Kendall’s tau correlation coefficient for X*^^^ and A natural test is then to 
reject Hq for large yalues of the statistic 

(1.5) := ^ 

l<p<g<m 

As an estimator of the dependency signal 


( 1 . 6 ) 


E 

l<p<g<m 


(e ) 


the “plug-in” statistic Sr from is biased and thus needs to be recentered to obtain a mean 
zero asymptotic null distribution under our considered regime m,n —> 00 . Alternatively, we 
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may instead attempt to form an unbiased estimator of to serve as a test statistic. As 
shown in Section [3l such an unbiased estimator is given by 


(1.7) T, := 


4!a) 


^sgn ( 


-A) 

( 1 ) *^( 2 ) 


dp) 


) sgn( 


X 


(p) 


dp) 


— 

(3) d(4) , 


X sgn A 


(9) 

Ui) 


-A, 


(9) 

.r(2) 


) Sgn (a 


(9) 

^(3) 


-A, 


(9) 

^(4) 


)■ 


where the summation is over all variable pairs 1 < p < q < m, ordered 4-tuples of indices 
1 < d < 12 < is < 14 < n, and permutati ons tt on four elements. This type of statistics is 
motivated by the work of IChen et al.l (gOlOj) and ICai and Mai (I20131 1. who tested the equality 
of S to the identity based on unbiased estimates of the squared Frobenius norm US — /mil f, 
where Im is the m-hy-m identity matrix. Under a Gaussian assumption for X, ICai and Ma 


(|2013f ) showed their test to be asymptotically minimax rate optimal. 

As a last variant, when testing for positive associations, it may be of interest to consider the 
statistic 


( 1 . 8 ) 


Z, := 


r(P9) 


l<p<q'<m 


which sums all pairwise sample correlations for a “one-sided” test. As we explain below, such 
a statistic also provides a “ two-sided” test for Hq when rank correlations such as the r* of 
Bergsma and DassiosI ( 2014 ) are used. In Section SI we show that all the statistics introduced 
above are asymptotically normal under suitable recentering and rescaling. 

Ke ndall’s tau is an exa mple of a U-statistic whose values depend on the data only via 
ranks (van der Vaart. 1998. Examnle 12.5). Indeed, the values of (II.4p. (11.71) and dUD remain 
unchanged if each observation a|^^ is replaced with its rank To be specihc, is the 

rank of among x[^'^, ■ ■ ■, X^'^. Other ex amples of measure s of association that are both 
U-statistics and rank c orrela tions are the D of Hoeffding ( 1948b[ ) and the aforementioned t* of 
Bergsma and DassiosI ( 2014 ). We note that for a pair of continuous random variables both of 
these statistics lead to consistent tests of independence, that is, their expectations are zero if 
and only if the two random variables are independent. Another classical example is Spearman’s 
rho, which is not a U-statistic but can be approximated by a rank-based U-statistic. 

The above examples of U-statistics are reviewed in Section [51 which also introduces a general 
framework of rank-based U-statistics that we adopt for a unified theory. In Section [3] we 
construct our classes of test statistics for the null hypothesis Hq from inD. Their asymptotic 
null distributions when to, n —> oo are derived in Section |31 Our arguments make use of 
a central limit theorem for martingale arrays and U-statistic theory. We emphasize that all 
our statistics admit a normal limit after appropriate rescaling, but just as in the classical 
theory for U-statistics, the scaling factors have a different order when degenerate U-statistics 
are considered. In Section [SI we explore aspects of power of our tests from a minimax point of 
view. Simulation experiments are presented in Section |6l which also discusses computational 
considerations in the implementation of the tests. Throughout, for our null distributional 
theory, we make no distributional or moment assumption on (A^^^ ..., A*^"*)) other than that 
it is a continuous random vector. This assumption is needed to avoid ties in observations and 
ranks. We conclude with a brief discussion in Section jT] 


1.1. Notational convention. For p G to}, we let := ..., ) be the 

vector of ranks of = (a}^\ ... ,Xn'^). The symmetric group of order I is denoted by &i. 
Depending on the context, its elements are treated either as permutation functions or as ordered 
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tuples from the set /}. For k < n, V{n, k) denotes the set of fc-tuples i = 

with 1 < ii < • • • < ife < n, and we will also identify the tuple i with its set of elements 
{ii,..., ik}- Hence, for any two elements i G V(ri, ki) and j S 7^(n, ^ 2 ) with 1 < ki,k 2 < n, 
the operations i U j, i D j, and i \ j give the tuples with increasing components that, as sets, 
equal the union, intersection and difference of i and j, respectively. For i G V{n,k), we let 
:= ..., and define the rank vector 


:= 


{ d(p) 

I • ■ •! ^i,k J J 


where is the rank of among , 




(p) 


Let p q index two distinct variables. Then and denotes the pairs 
and respectively, for c = 1,..., n. Similarly, given i = (ii,..., G P(n, k), we let 

:= (X-^\X^^^) and := c^) for c G {1,..., fc}. We then define the /c-tuples 

that are observation and rank vectors of pairs: 


r(P9) ._ 




R 


(pg) J 


andXl"^) := (x[^\ ..., x[^)) . 


When taking expectations under the null hypothesis Hq, we write ]Eo[-], whereas E[-] is the 
general expectation operator, possibly under alternative hypotheses. Similarly, we write Fo[’]j 
P[-], Varo[-], Var[-], Covo[-] and Cov[-] for the probability, variance and covariance operator 
under Hq and possibly alternatives respectively. Finally, || • ||oo and || • II 2 are the max norm 
and Euclidean norm for vectors, respectively, and the Froebenius norm of a matrix is denoted 
by II • ||f- For two sequences (un) and (bn), the symbol a„ x is used to indicate the existence 
of constants c, C > 0 such that c|a„| < | 6 n| < C'|a„| for all indices n. 


2. Rank correlations as U-statistics 

This section lays out a rank-based U-statistic framework that encompasses all rank correla¬ 
tions we will use when constructing specific test statistics for Hq in Section [31 Let 

h : ^ M 

be a symmetric function of fc > 2 arguments in K^, i.e., for all choices of = (x[^\x\^'^) G 
i = 1,... ,k, and any permutation tt G &k, it holds that h (xi,..., xj,) = h (x^(ip ..., 

For any pair of distinct variable indices p,q G {!,..., m}, the function h yields a U-statistic 

(2.1) Pi"’= 7^ E = T ^ a(x«). 

In this context, h is termed the kernel of the U-statistics and is said to be of degree k. 

Subsequently, we always assume that the kernel h and the induced U-statistics from m 
are rank-based, that is, the kernel has the property that h(xi, ... ,Xfe) = h(ri,... ,rfc) for all 
arguments xi,...,Xfc G Here, for each argument x^ = G we let = 

(r-^\ r-^^) with being the rank of among Xi\...,x\^^ for j = l,2. If from (|2.H1 

is rank-based, then 

(2.2) pr’ = 4 E '■ (RuR..'"’) = 4 E '■(rU). 

\k) ig-p(„,/c) \k) i(='p{n,k) 

Note that (Rp'^\ ..., uniquely determines all fc-tuples (RpP\ ..., Rp^p. 
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The following lemma lists elementary properties of under Hq. It relies on the fact 

that under Hq the distribution of does not depend on the choice of i, p and q because 

the rank vectors , R^™) are i.i.d. according to a uniform distribution on the symmetric 

group &n] recall that we assume the original observations to be continuous random vectors such 
that ties among the ranks have probability zero. A proof of the lemma is given in Appendix [Cl 

Lemma 2.1. Suppose g{ ) is a real-valued function defined on and for 1 < p q < m, 

is symmetric in the n arguments , Rn^'^\ The random variables g^P'^^ satisfy the fol¬ 

lowing properties under Hq : 

(i) If p ^ q, then g^Pil ihg same distribution as ■ 

(ii) If p q, then g^P^'^ is independent of'X.^Pl (and also independent of'X('^'>). 

(Hi) For any fixed 1 < I < m, the m — 1 random variables g^P^\ p I, are mutually 
independent. 

(iv) If p q, r s and {p, q} {r, s}, then g^P‘>'> and are independent. 


In this paper we assume all kernel functions h to be hounded. Since h can be recentered if 
needed, without loss of generality, we will further assume that Eo[/i(R|^'^^)] = 0 , a property 
exhibited by all the examples below. 

Example 2.1 (Kendall’s tau). If we take h in (j2.2ll to be the kernel of degree k = 2 given by 
hr{ri,r 2 ) = sgn , 

then := is Kendall’s tau, which measures the association of and by 

counting concordant versus disconcordant pairs of points. 

Example 2.2 (Spearman’s rho). Let 

6 JE / 2 


(2.3) 


pF = 1 - 


E(«l'”--R.“) ^ 


r(n 2 - 1 ) ^ 

^ ' 1 — 1 

be the Spearman’s rank correlation coefficient (rho) between X^^^ and Define := 

, where hp^ is the kernel function of degree 3 given by 


(2.4) 


hp, (ri, r 2 , rg) = i ^ sgn sgn (r^^^ - . 


7T^&3 


Hoeffdind (jl948al . p.318) showed that 


(2.5) 


Jpg) ^ ^ ~ ^ Jpg) 

n + r" 


n + 1 


Jpg) 


Hence, the dominating term of Spearman’s rho is a U-statistic. 
Example 2.3 (Hoeffding’s D statistic). Let 


hoiri,--- , 1 - 5 ) = ^ 


cj) (ri\\ ... ,ri\^'^ 


.( 2 ) 


„( 2 )^ 


TTi 5 • ■ • 5 ' 'TI’S 


5! 


itGGs 


where 


(j){ri ,... ,r5 ) = (/ (ri > r 2 ) - / (ri > rg)) (/(n > r4) - J(ri > rg)) 
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and /(•) is the indicator function. Hoeffdina ( 1948b ) suggested the statistic := 


^ Hd 


to measure association between the vectors 'X.^p'> and When the joint distribution of 

{X^P\ has continuous joint and marginal densities, the expectation 


E 




is zero if and only if X^p'> and are independent ( Hoeffdina . 1948bl . Theorem 3.1). 

Example 2.4 (Bergsma and Dassios’ t*). In a recent paper, Berasma and DassiosI ( 20141 ) intro¬ 
duced := a U-statistic of degree 4 with the kernel 

1 


•(ri,--- >1-4) = ^ I] </>(' 


,(i) 


-7rG©4 




( 2 ) 


r( 2 ) 
' ’ 774 


where now 

<^(ri,... ,r4) = /(ri,r3 < r2,r4) -|- /(ri,r3 > r2,r4) - /(ri,r2 < r3,r4) - /(ri,r2 > r3,r4). 
According to Theorem 1 in Berasma and DassiosI ( 20141) . t* is an improvement over Hoeffding’s 
V in the sense that the vanishing of E[/it. • • • , characterizes the independence of 

X^P^ and under the weaker assumption that (X^p^X^'^^) has a bivariate distribution that 
is discrete or labsolutelv) continuous, or a mixture of both. In fact, in their paper Berasma 
and Dassios ( 20141 ) conjectured that even this assumption is not necessary. 

Returning to our general setup, the variance and also the large-sample behavior of the 
statistic is determined by the covariance quantities 


( 2 . 6 ) 


Q := Cov 


h (r|p«)) h (r]^’®^) 


c = 0 ,..., fc, 


where i, j G 7^(n, k) are such that |i fl j| = c. When Hq is true. 


(2.7) 


Cc = Eo 


'h (Rp«)) h (rJ^«^) 


as we are assuming that Eo[h(R|^‘^V] = 0- Furthermore, the value of <Cc does not depend on 
the choice of (i,p, q) under Hq. In the sequel, it will be clear from the context whether is 
defined under Hq or an alternative hypothesis. 

It is well known that 0 = Co — Cm • ■ • > ^ Cfe > an d the kernel h is said to have order of 
degeneracy d if Co = Ci* = ''' = Cd-i — 0 and Cd > 0 ( Serflina . 1980l . chapter 5). If d > 2, the 
kernel and the U-statistic it defines are referred to as degenerate. For any c = 1,..., fc, it holds 
under Hq that 


( 2 . 8 ) 


Cc =0 


Eo 




(P9) ^ 


X.Ip?) 


= 0, almost surely. 


where F C i may be any subset with |i'| = c. In particular, for the kernels hj^ and ht», the 
right-hand side of (12.8|) holds with c < 1. 

As in the classical theory of U-statistics, Cd will play a role in our asymptotic results for 
the test statistics we construct from rank-based U-statistics, for which the kernels have order 
of degeneracy d = 1 or d = 2 under Hq. However, when d = 2, an additional quantity is 
needed to describe our asymptotic results. For a symmetric kernel h : (R^)^ —K with order 
of degeneracy d = 2 under Hq, we define 

(rI^)) h (rI^)) h (R|r^)) h (Rir^^ii 


(2.9) 


:= En 




where 


G Tin, k) are any four tuples such that 














































Table 1. Degree k, order of degeneracy d, covariance fourth moment 

r]^ for the kernel functions in Example I2.1H2.4I when independence holds. 
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Kernel 

K 

^Ps 

ho 

hf 

k 

2 

3 

5 

4 

d 

1 

1 

2 

2 


1/9 

1/9 

1/810000 

1/225 

(7* 

- 

- 

(7/864000)^ 

(2/525)^ 


(i) |u4^iD|=4fc-4, 

(ii) |i^ n i^l = |i^ n i^l = |F n n = 1, and 

(hi) no index i £ is an element of more than two of the sets ... i^. 


For our purpose we only need to define 77^ under Hq , and it is also easy to see that the choice of 
p, g, i“, w = 1 ,..., 4, does not matter in its definition. Table [T] collects the order of degeneracy 
d under Ho, and the quan tities Cd for the kernels in Example I2.1H2.4I The latter are 

found in iHoeffdina ( 1948a 3), and by our own calculations. 

Finally, it is easy to check that all the kernels in Example I2.1H2.4I satisfy the following 
property that will be assumed for our null asymptotic results. 


Assumption 2.2. Let h : — >■ R &e a symmetric kernel with order of degeneracy d > 1 

under Hq. Then given i = (zi,..., ife) £ Pin, k) and 1 < p q < m, 


Eq 





for allj,y C i such that min(|j|, |j'|) < d. 


= 0 


3. Test statistics 


We now proceed to construct test statistics for the independence hypothesis Hq from (HU. 
Building on the pairwise rank correlations from Section [21 we introduce general classes of 
statistics and derive their respective asymptotic null distributions when m,n —> 00 . 

3.1. Sum of squared sample rank correlations. Let be a rank-based U-statistic as 
defined in dlU, with mean zero when and are independent. Suppose further that 
large absolute values of indicate strong association (positive or negative) between 

and It is then natural to reject Hq for large values of the centered quantity 

(3-1) A- E (t'“)'-(r)^'- 

Here, ph ■= Eo[(C^^^'^^)^]- Note that, as indicated by our notation, this expectation does not 
depend on the choice of p and q by Lemma l2.ir zb The following lemma specifies ph and gives 
a result on other moments of that will be used later. 


Lemma 3.1. Let n > 2k > 2, and suppose that from diu has a kernel h with order of 

degeneracy d under Hq . Then the following three facts hold under Hq : 

(^) 







C=1 
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(ii) For any r > 2, 


Eo 


(u. 


ipq)y 


= 0{n 


-L(rd+1)/2J 


(Hi) 


where [-J denotes the floor function. 




En 


{v;rF 


h\2 


+ O (n-3) 


if d=l, 
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{((:^f+4.r,^)+0{n-^) tf d = 2. 


For Leinma[OJi) and (ii), see Lemma 5.2.lA and 5.2.2B in ISerfliii^ ( 1980li . The last claim 
about the leading term of the fourth moment is proven in Appendix [Pl Let //r, Hd and 
yLt* be the values of fih when h is equal to Ht, hp^, hjo and hf respectively. Then 

2(2n + 5) (n^-3) 


jJjj- — 




9n(n — 1) ’ 

2(n^ + 5n — 32) 


h-Ps = 


9n(n — l)(n — 3)(n — 4) ’ 75 n(n — l)(n — 2)(n — 3) 

The first three quantities can be found in lHoeffdiiig ( 1948al f9l. The stated value of is based 
on our own calculations. 

3.2. Unbiased estimator of the sum of squared population correlations. The kernel 
function h is central to the role of as a measure of association between the vectors of 

observations and . At the population level, the association (positive or negative) is 
captured by the expectation of , which is also equal to 


n{n — l)(n — 2) ’ 

8 3n2 + 5n - 18 


(3.2) 


:= E 




{pq)\ 


where j may be any element in V{n, k). Hence, 
(3.3) 




1<P<(7<?71 

is a population measure of overall dependency in the joint distribution of ,..., . As an 

alternative approach to Section [O] we now construct an unbiased estimator of (13.31) . targeting 
more directly the problem of global (in-)dependence. 

Recall that given i G P{n, 2k) and j G P{n, k) such that j C i as sets, i \ j is the fc-tuple in 
7^(n, k) that is given by their set difference. The function 

-1 


(3.4) 




(r[ 


ip<})\ ._ 


! 


U k(R!”>)(.«) 




jCi 

i^'P(n,k) 


is symmetric in its 2k arguments ..., Ri^fc\ due to the 


defined on the domain 

symmetry of h and the summation over all possible tuples j G 7^(n, k) contained in i on the 
right hand side of (13.41) . Moreover, hf^ is an unbiased estimator of the square of the expectation 
in (13.21) . since each summand on the right hand side of (13.41) is a product of two independent 
unbiased estimators of . Therefore, defining the U-statistic 

-1 


(3.5) 


= 


wl”' (r<”*.rM) 


E 

iG'P{n,2k) 




(P9)\ 
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we have that the sum 


(3.6) n := ^ 

l<p<<j<m 

is an unbiased estimator of ()3.3p . The statistic Th is a U-statistic itself and serves as a natural 
test statistic for Hq. Large values of Th indicate departures from Hq. When h = hr, i.e., the 
case of Kendall’s tau, Th equals the statistic displayed in (11.71) in the introduction. 

Clearly, is a rank-based U-statistic with the kernel of degree 2k. The following 

lemma summarizes the degeneracy properties of h}^ under H^. 


Lemma 3.2. Suppose h : (R^)^ —> K is a symmetric kernel function of degree k with order 
of degeneracy d G {1,2} under Hq. So, > 0. Then, under Hq, the induced symmetric kernel 
function h^ defined in dSH) has order of degeneracy 2d and 


= Eo [h^ (r^'^) 



{{C^Y + 2v^} 


if d=l, 
if d = 2, 


where i, j G Tin, 2k) and |i fl j| = 2d. 


The proof of the lemma is deferred to Appendix [D1 


3.3. Sum of sample rank correlations. For testing Hq it is also interesting to consider the 
simple sum 

(3.7) Z, := ^ 

l<p<g<m 

which unbiasedly estimates the signal 

( 3 . 8 ) 5 : 

l<p<.q<m 


compare with (13.31) . 


When the kernel h is hp„ 


or hr, without the squaring as in (13.31) . 


may not be an effective measure for the overall dependency of ..., since any pairwise 


signa l can be either negative or positive depending on the direction of association ( Kruskal 
19581 ). Hence, the rejection of Hq for large value of Zh is only good for testing against the 
sided” alternative 


one- 


E 

1<P<(7<?71 


> 0, 6^'^^ > 0 for all p < q. 


is an effective measure of the overall dependency 

q(P9) 


However, when h = hf or h = hjj, 
of since any pairwise signal is non-negative and equals zero if an d only i f 


and X^'^^ are independent u nder the weak assumptions in the work of Hoeffdina ( 1948bll 
and iBergsma and DassiosI (|2014l l. In this case, large values of Zh detect dependency among 
X^^\ ... without any restrictions to the direction of the pairwise associations. 
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4. Asymptotic null distributions 


We are now ready to state our results on the asymptotic distributions for the test statistics 
introduced in Section |31 As mentioned in Section [51 we focus on rank-based U-statistics with 
a kernel h satisfying Assumption 12.21 and order of degeneracy d G {1,2} under Hq. 

Theorem 4.1. Suppose the null hypothesis Hq from Gjp is true. Let h be a symmetric 
bounded kernel function of degree k satisfying Assumption l£.£l and consider the asymptotic 
regime m,n —> oo. If d = 1, after suitable rescaling, Sh, Th and Zh are asymptotically 
normal, namely, 

nTfi V^Zh 


nSh 


k'^mCjf' kmy^ 


AA(0,1). 


If d = 2, then 


,2 A 


il) Su 


n‘ 






2myJ (C2 ' 2m\/ (C2 

The theorem covers in particular the rank correlations from Examples I2.1H2.4I A critical 
value for an approximate a-size test can thus be calibrated based on normal quantiles. As in 
the classical theory for U-statistics, the rescaling factors for the non-degenerate and degenerate 
cases differ in order; for instance, we have to multiply Sh with a factor of order 0{nlm) when 
h has order of degeneracy d = 1, and with a factor of order 0{n?fm) when h has order of 
degeneracy d = 2. The ingredients needed to compute the rescaling factors were given in 
Table lU In slight abbreviation, we write Sr, Sp„, Sd and Sf for the four versions of the 
statistic Sh from (EH) with the different kernels reviewed in Section [51 and analogously, , 
Tps, Td, Tf and Zr, Zp^, Zjj, Zf for the versions of Th and Zh from (13.61) and (13.71) . This 
matches the notation used in (fTSl) . (ITJl) and (HH. 

We remark that while the classical Spearman’s rho is not a U-statistic one may of course 
consider the centered test statistic 


(4.1) 


Sp. := 


l<p<q'<m 


- ( 


m\ 

Ai^p, 


where \ip^ := = l/(u — 1); see Hoeffding ( 1948a . p.321). The convergence of 

^Sp^ to a standard normal distribution, as suggested by Theorem 14.II and Tabled] implies the 
following distributional convergence for Sp„. Its proof, given in Appendix lEl mak es use of the 


decom position from (12.51) . The same result has been obtained bv IZhoul (|2007j) and I Wang et al 
(j2013l ) via different methods. 


Corollary 4.2. Under Hq, f^Sp^ A^(0,1) as m, n —>■ 00 . 

OurDroofofTheoremJ4T]is_based_^nj^^entraldimitUieoremTormarti^aleai;ra^^Hall and 


Heyde, 1980l . Corollary 3.11 that was also applied bv Schotd l 20051) . We outline the approach 
here, postponing computations verifying the conditions of the martingale CLT to Appendix lEl 

Proof of Theorem EH Fix a sample size n. For q = 1,..., m, let iFnq be the cr-algebra generated 
by ..., (or for our purposes, equivalently, ..., under Hq. For convenience 


we will use the shorthand U, 


ipi) 


:= (u<r>y 


— fih for \ < p < q < m. Let 


Di 


q-1 

p=i 


(pi) 

h ’ 


Di 


q-1 

:=^1U, 

p=i 


(p<?) 
h 1 


and Dt 


q-1 


(pq) 

h 


(4.2) 
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and set = 0 . Writing Snq = ELi^ni> ^"9 = ELi and Z„, = 

Yld=i Dnh ■^6 have that Sh — Snm •) Th = Tnm and Zh — Znm • 

We claim that, for each n, the sequences 


(4.3) 


^nqj 1 ^ ^ ^ TTi} , {^7195 ^nqi 1 ^ ^ ^ 777 .} and ^riqi 1 ^ ^ ^ TTl} 


are martingales, i.e., Eg n,9— i] — ‘5'n,9— i? Eg [7719!*^71,9—1] — ^n,q—i and Eg j^Z^qj-T^ 71,9—1] — 

Zn,q-i for q = 2,... , 777 . Given the way Snq, Tnq and Znq are defined as sums, it sufiices to 
show that 

( 4 . 4 ) Eg 


77 (P 9 ) 

^n,q—l 

= Eg 


^n,q—l 

0 

II 

tApq) 

^ n.,g—1 


= 0 


for all 1 < p < 9 < 777. Since ..., are independent under Hq, conditioning on jFn,q-\ 
is the same as conditioning on X^^^ alone in ()4.4p . As and are all symmetric 

functions of the 77 arguments ,..., , (14.41) follows from Lemma HHi) and (ii). 

By the boundedness of our kernel h, each of the martingales in (14.31) is trivially square- 
integrable. As such, the central limit theorem for martingale arrays from Corollary 3.1 in Hall 


and Heyde (|l980ll implies the assertion of Theorem 14.11 if we can show that the squares of the 


martingale differences Hfj, and each satisfy the following two conditions. The first 
condition requires that as 777,77 —> 00, 


(4.5) 


^^Eg [iD^^,f\:Fn,l-l] , [{Dli)^\Tn,l-l] 

1^2 


m 


h\2 


fc"(cr) 


(4.6) 

for d = 1 , and 

(4.7) 

(4.8) 

(4.9) 


72 ^ 
Z =2 


Eg [(i^f;)2|.F77,,-l] 




4 m 

^^Eg [iD^^if\Tn,l-l] 

1^2 

4 m 

^^Eg [iDli)^\Tn,l-l] 
1^2 
2 

^EEo [(74f,)2|j-„,_i] 

1^2 



P 


^Q\i<:2r+&v’^} , 

4 ( 2 ) {(C2)^+2dq, 



2 


for d = 2, where the convergence symbol stands for convergence in probability. The second 
condition is a Lindeberg condition. In Lemma IE.II in the Appendix [El we show that, in fact, 
(gSI-dlSl) also hold in the stronger sense of (or quadratic mean). Lemma lE.21 proves a 
Lyapunov condition that implies the Lindeberg condition, which completes the proof of Theo¬ 
rem |4T] □ 


5 . Aspects of power 

In order to investigate the power of our tests we adopt an asymptotic minimax perspective. 
While our null distributional results in Section [3] are valid under the more general asymptotic 
regime m, 77 —> 00, we treat here the particular regime ^ —> 7 G ( 0 ,00). Recall the definition 
in (13.21) . and let 0 = idl^'^'^)i<p<q<m be the (™)-vector comprising all these pairwise measures 
of association. In our exploration of power, it is at times convenient to have U-statistic s with 
a kernel h of degree 2. For instance, we apply results for U-statistics of degree 2 from IChenI 
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([20161). Consequently, our power analysis focuses on the two classes of statistics Sh and Th for 
the kernel h = hr oi Kendall’s tau. To indicate this restriction, we write := 
for i e V{n,2) and Qr = 

Let 'Dm be a family of continuous joint distributions on R™ containing all m-variate Gauss¬ 
ian distributions, to be considered as joint distributions for ..., For a given 

significance level a € ( 0 , 1 ), we study which sequences of lower bounds £„ on the dependency 
signal || 0 t ||2 allow tests to uniformly achieve a fixed power (i > a over the set of alternative 
distributions 


(5.1) 


Vm{\\Qrh>en) -{D&V: 


10 . 


> 


As usual, we take a test <() to be a function mapping the data into the unit interval [0,1]. Given 
a test statistic S = S'(Xi,... ,X„), we write (j)a{S) for the test that rejects for large values of 
S and has (asymptotic) size a. 

The statistics Sr and Tr estimate the squared Euclidean norm of the signal || 0 r|| 2 - They are 
thus natural when the interest is in detecting the alternatives in (15.11) . The following theorem 
gives a rough lower bound on the signal size || 0 r ||2 that is needed for detectability. 


Theorem 5.1. Let 0 < a < /3 < 1. Under the asymptotic regime m/n 
exist constants Ci = Ciia, /3, 7 ) >0 for i = 1,2, such that 

(^) 


liminf inf E[(5')c(5'.r)] > P for = Ciyfn, 

n -)-oo X>„,(||e^||2>e„) 


7 € ( 0 , 00 ), there 


and 


(a) 

liminf inf E[( 5 ')q(Ti-)] > /3 for = C 2 '/n. 
n-^00 D^(||e,|| 2 >en) 


Our proof of Theorem l5.1l uses rather general concentration bounds and it should be possible 
to sharpen the analysis to show asymptotic power for (fa^Sr) and (j)a{Tr) under smaller signal 
strength. Indeed, we conjecture that a test based on Tr can asymptotically attain uniform 
power P when the signal size || 0 t ||2 is of constant order 0(1). This conjecture is partially 
supported by Theorem 15.21 below. 


5.1. Rate-optimality under equicorrelation. When the joint distribution of ..., 
is a regular Gaussian distribution, then Hq is equivalent to R — Im = 0, where Im is the m-by- 
m identity matrix; recall that R is the population Pearson correlation matrix. For any e > 0, 
define the alternative 


(5.2) 


A/’mdIi? — /m||F > e) 


as the family of regular m-variate Gaussian distributions wh ose correlati o n ma trix R satisfies 
||.R— Im\\F > £■ Fix any a,P £ (0,1) with a < p. A result of Cai and Ma ( 201.ll . Remark 1(a)) 
implies that in the regime m/n —> 7 , there exists a sufficiently small constant c = c(a, /3, 7 ) >0 
such that 


limsup inf E[( 5 ()] < P 

n—>-OC A/”m (II R— II F^c) 

for any a-level test p. In other words, asymptotically, no a-level test can uniformly achieve the 
desired power against the alternative (15.21) when the signal size \\R — ImWp is allowed to be as 
small as c. It follows immediately that in our nonparametric setup there also exists a constant 
c = c(a, /3, 7 ) > 0 such that 

limsup inf Eldil < P 

n-S-OO Vm{\\0r\\2>£) 
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for any a-level test <j). This is true because the nonparametric class 'Dm contains all m-variate 
Gaussian distributions, and because x when and are jointly Gaussian. 
The latter fact follow s from = sin for non-degenerate elliptical distributions; see 


Lindskog et al.l (120031 ). 


Given the observation just made, an a-level test </> that satisfies 


(5.3) liminf inf E[(5()] > j3 

for a large enough constant C = C{a, 13,'y) >0 would be rate-optimal. If the signal ||0r||2 is 
large, being an unbiased estimator of HOrlli our statistic Tr always centers around the same 
large value regardless of the true underlying distribution of X. It is hence natural to conjecture 
that the optimality condition (15.31) is satisfied by the test (paiTr), for a reasonable class of 
elliptical distributions Dm that extends beyond the Gaussians. Our next result supports the 
conjecture. 

Let A/)^‘^'^‘(||0r||2 > C) be the set of m-variate Gaussian distributions that have all pairwise 
(Pearson and thus also Kendall) correlations equal to a common value such that || 0 t ||2 > C- 
If = 0 for all I < p 7^ q < m, then ||0 t ||2 = 


Theorem 5.2. As — —> 7, there exists a constant C = C{a,P,"f) > 0 such that 


liminf inf E[0Q,(rT-)] > [3. 

n—i-oo M3r\\\‘S>A\2>C) 


The theorem is proven in Appendix |F1 Our simulation experiments on power in Section [5] 
corroborate the conjecture made above. 


5.2. Comparison with the “max” statistic. The work of lHan and Liul (120141 ) considered 
testing the independence hypothesis iLo from HH) using maxima of rank correlations and, in 
particular, the statistic 


(5.4) 


5 " 


:= max |r 

l<p<q'<m 


(p?) I 


that is based on Kendall’s tau. iHan and Liu ( 2014 ) derived the asymptotic null distribution 
under the regime logm = o(n^/^). Let be the level a test that rejects for large values 

of S'™®'’'. Naturally, this test is powerful against alternatives belonging to the set 


(5.5) 2?™(||0r||oo > en) := {D^Dm- II0.IIOO > en}, 

which is characterized by the max norm of 0,-. Indeed, when logm = o(n^/^), for a given 
significance level a and targeted power (3 G (a, 1), it was shown that there exists a constant 
Cl = Ci(a,/3) such that 


lim inf 

n —>-oo , 


inf ,_ E[<(>„(Sr")] > /3. 

'Dm(|| 0 x||oo>ci-y/(logm)/n) 

iHan and Liij ( 20141 ) also showed rate-optimality of this test, i.e., there exists a constant C2 = 
C2(a,/3) < Cl such that for any a-level test </), 


(5.6) lim sup inf E[0] < /3. 

" 'Dm(||0T||oo>C2\/(log m)/n) 

Note that in the regime m/n —> 7 that we consider in this section we have logm = oirA/^). 

While a test based on S'™®’' is rate-optimal in detecting alternatives of the form (15.51) char¬ 
acterized by the max norm signal, it is—as intuition suggests—not powerful in detecting alter¬ 
natives with small but non-zero dependence among many pairs of random variables. The latter 
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scenario is best described via the Euclidean norm as in mi- This is demonstrated by the 
following theorem about equicorrelation alternatives; recall the positive result in Theorem [521 


Theorem 5.3. As ^ —> 7, there does not exist any constant C = C{a,l3,"f) > 0 such that 

liminf inf > /3. 

AfT\\\e^\\2>c) 


The proof of the theorem is deferred to Appendix iFl It relies on a comparison lemma of 

Chernozhukov et al. 

(2013 

) and a recent result on Gaussian approximation for high-dimensional 
). Theorem 15.31 savs that a signal size 0 t I 2 of constant order is not 
eset asymptotic power for a test based on under the regime 

U-statistics in Chen 
enough to guarantee 

(2016 

? a pi 


— —7. We demonstrate this in our simulations in the next section. 

rj. I 


6 . Implementation and simulation experiments 

We now compare several tests of the independence hypothesis Hq based on specific versions 
of the statistics introduced in this paper. Our simulations first explore the size of the tests when 
critical values are set using asymptotic normal approximations. We then compare their power. 
Before turning to the simulations, however, we discuss the computation of the test statistics. 


6.1. Implementation. In order to compute the statistics Sh from (13.11) and Zh from (13.71) 
for m variables, one has to make (™) evaluations of the U-statistics In general, for a 

U-statistic of degree k, a naive calculation following the definition in (12.21) requires 0{n^) opera¬ 
tions. Fortunately, more efficient algorithms are available for the specific examples covered here. 
For instance, Spearman’s from Example 12.21 can b e computed i n Ofn logn) operations. 
The same2s_tri]e.Joi^J^endairs_T^f£l^;onL^2£amBleJ13]-i Qiristensen ioOSlhSimilajly. Weihs 


et al. ( 2016l l showed how to compute the Bergsma-Dassios sign covariance in 0(n^ loenl 
operations despite the fact that its kernel has degree k = 4 , as revie wed in Example I2TI An 
improvement to 0{n^) was given by Heller and Heller ( 2016ll . Finally, Hoeffdin^ ( 1948bll gives 
formulas for efficient computation of his statistic D in Section 5 of his paper. 

The situation with the class of statistics Th from (13.6p is more complicated. Since a kernel 
h of degree k gives rise to an induced kernel h^ of degree 2k, the number of operations equals 
0{n‘^^) if we compute by naively following its definition. This would lead to a total of 

operations to find all 1 < p < q < m. A more efficient way to compute each 

in 0{n^) time proceeds as follows. Using (13.4|) and (13.5|) . we see that 

(6^1) = Y. 

\k) \ k ) ig'P(n,fe) 


where for each i G V{n, k), and suppressing the dependence on the pair (p, q), we define 

hi := h and hi := ^ hj. 

jG'P(n,fc):jni =0 

Hence, it suffices to calculate (i) hi for all i G 'P(n,k), (ii) hi for all i G 'P{n,k) and (hi) the 
summation in (EH), in that order. Evidently, step (i) involves 0{n^) operations. By the 
inclusion-exclusion principle. 


^ ^ ( 1 ) ^ 


jG'P(rt,fc) 


l<t<k 


j'eP(rt4 

j'Ci 


( 6 . 2 ) 
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where hy := J2}cv(n fc) j'cj 1 < £ < k and j' C V{n,£). Note that there are 0{t/) 

many j' G 'P{n,£), and each hy is a sum of 0{n^~^) many terms. Finding hy for all j' G V{n,t) 
and 1 < £ < k thus requires 0{n'^) operations, and with these as ingredients, by (16.21) . one 
can compute each hi in 0(1) operations if k) already known. But the quantity 

SjGP(n k) computed once, with another 0{n^) computations. Consequently, 

step (ii) involves 0{n^) operations, and so does the final summation in step (iii). 


6.2. Simulations. We first consider the sizes of tests based on our statistics Sr, Sp^, St*, j-t, 
Tp^ and Zf that we introduced in Section Fo r comparison, we also consider the sum of 
squared Pearson correlations Sr from Schott ( 2005l l ; recall (ll.3p . Each test compares a rescaled 
test statistic to the limiting standard normal distribution from Theorem 14.11 and Corollary 14. 2 1 
Targeting a size of 0.05, the null hypothesis Hq is rejected if the value of the rescaled statistic 
exceeds the 95th percentile of the standard normal distribution. Table [5] gives Monte-Carlo 
estimates of finite-sample sizes for different combinations of n and m. The data underlying 
the table are i.i.d. noncentral t with v = Z degrees of freedom and noncentrality parameter 
fi = 2. For each combination of m and n, the sizes of the tests are calculated from 5,000 
independently generated data sets. As expected, the tests that use rank-based statistics all 
have their sizes get closer to the nominal 0.05 when m and n increase, but the test based 
on Sr is not valid as it rejects too often. Recall that Schott’s limit theorem is derived under 
a Gaussian assumption. For certain new non-parametric tests introduced in this paper, the 
test sizes are not very satisfactory when n is small, but they all get close to the nominal 0.05 
level once n becomes 128, indicating that the asymptotics described by Theorem 14.11 kicks in. 
Surprisingly, the test given by Sp has good size even for very small n. It would be of interest 
to explore more refined results, such as a Berry-Esseen bound or an Edgeworth expansion for 
the normal convergences of Theorem 14.11 in future research. 

Next, we consider the power of the tests, as studied in Section[5] Eor different combinations 
of (m,n), we generate data as n independent draws from three different m-variate elliptical 
distributions. These are 

(i) the m-variate normal distribution: Nm{0,Z£), 

(ii) the m-variate t distribution: t,^= 20 ,mih = 2 • Im, E), and 

(iii) the m-variate power exponential distribution: PE{p, = Q,Y,,v = 20). 

Here, Im is the m-vector with all entries equal to 1, and the parametrizations of these distri¬ 
butions are in accordance with lOial (|201Cll . pp. 8-10). For each distribution, the scatter matrix 
E = {(Jij) is taken to be a matrix with I’s on the diagonal and equal values for the off-diagonal 
entries, which are set to obtain the signal stre ngths HCrlli = 0-1, 0.3, and 0.7 based on Kendall’s 
r. We refer again to iLindskog et al.l (120031 ) for the relationship between E and HCrlli- The 
power, compute d based on 50 0 repe titions of experiments, for tests based on Sr, Tr, and the 
statistic of lHan and Liul (|2014l ) are compared in Table [3] As expected, S^^^^ is not well- 
adapted for detecting the alternatives we generated. For each (m, n) combination and a given 
value of II ©T-Hi, the power of the test based on Tr is similar across different data-generating dis¬ 
tributions. In contrast, Sr tends to yield more power for t-distributed data, and less power for 
data with power exponential distribution. The stability of the power rendered by Tr points to 
our conjecture in Section [5] on the minimax optimality of Tr over a wider class of distributions. 

When the data are generated from multivariate normal distributions, Table [3] includes a 
comparison to three further tests. First, Schott’s Sr from (HU yields a valid (asymptotic) test 
in this case. As seen in Table O the three statistics, Sr, Tr and Sr give comparable power 
for different combinations of {m,n) and signal strength llOrlli- Second, we tried the likelihood 
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ratio t est (LRT) with critical rejection region calibrated based on Corollary 1 in I Jiang and Qi 
(120151 1 whenever it is implementable, i.e. when m < n in the table. It is generally less powerful 
than our new tests and Sr in detecting the alterna tives we consider. Lastly, we experimented 
with the st atistic proposed in ICai and Mai (|2013ll . which again demonstrates similar power. 
The test of ICai and Mai (|2013ll is minimax rate optimal in detecting the Frobenius norm signal 
IIS — dmII2, but only for testing the different hypothesis Hq : 'E = Im and under a Gaussian 
assumption on X. Under Gaussianity, our hypothesis of independence Hq from (ED is of course 
equivalent to the R = Im instead. Despite this mismatch, the comparable power of the test of 
Cai and Mai (|2013f l indicates that the three statistics Sr, TV and Sr are all powerful in detecting 
the signal ||i? — ImlU x HGrlb; recall that our experiment has E with I’s on the diagonal so 
that E = i?. Lastly, we speculate that Sr is minimax optimal in detecting the signal ||i? — /mib 
for the null hypothesis Hq under a Gaussian assumption on X, although to our knowledge this 
has not vet been demonstrated theoretically in the literature: see also the last section of Cai 


and Ma ( 201,11 1 for other related open problems." 


To provide further evidence for the conjectures we have made, we repeated the above sim¬ 
ulation study in a case without equicorrelation. Specifically, we generated data from elliptical 
distributions with scatter matrices E that are pentadiagonal. The precise setup has E with I’s 
on the diagonal, equal values for the entries (7^, 1 < |* — jj < 2, and zeros elsewhere. The 
results are reported in Table |4] and lead to similar conclusions as Table [I] 

Finally, in Tabled we report Monte Carlo estimates of power in a setting of data contamina¬ 
tion and without restricting solely to Kendall’s tan. We generate data as n independent random 
vectors Xi,..., X„ whose m coordinates are dependent. Each X^ is multivariate normal, with 
mean vector zero and pentadiagonal covariance matrix. Precisely, X^ ~ Nm (0, Eband2), where 
Eband2 = (cij) has diagonal entries an = 1 and entry Oij — 0.1 if 1 < |* — J| < 2 and Oij = 0 
if I* ~ j\ > 3. For each combination of {n,m), we randomly select 5% of the nm values of the 
data matrix to be contaminated. Each selected value is replaced by an independent draw from 
iV(2.5,0.2) multiplied with a random sign. Such outliers tend to decrease observed correlations, 
but the rank correlations are affected less than Pearson correlations. The empirical power of 
these tests is computed based on 500 repetitions of experiments. As the results in Table[5]show, 
Schott’s Sr tends to give smaller power than the other statistics. At the larger sample sizes, 
when the test have approximately nominal size (recall Table [2]), the ‘Kendall statistics’ Sr and 
Tr show rather similar power, and the same happens for Sp^ and Tp^. Eor the Bergsma-Dassios 
statistics, there is some evidence that Zt* has greater power than St* in this setting. 


6. 3. Comparis on of the statistics. When data are approximately Gaussian, the statistic Sr 
of ISchottI (|2005i ) yields a powerful test. Since the computation of a Pearson correlation is linear 
in the sample size n, it is inexpensive to compute, and its distribution is well-approxim ated by 
a norm al limit at surprisingly small sample sizes (see Table 1 in the original paper of [Schott 
(120051 11. However, as one would expect, our simulations show that the size of the test may be 
far from nominal in non-Gaussian settings. 

The Kendall and Spearman ‘sum of squares’ Sr and Sp^ are attractive alternatives that are 
nearly as efficient to compute as Sr ■ The use of rank correlations guards against effects of non- 
Gaussianity all the while leading to rather little loss in power when data are indeed Gaussian. 
Gompared to Sp ^, Sr requires somewhat larger samples for the normal approximation to the 
null distribution to be useful. 

The statistics Th similarly guard against non-Gaussianity but are computationally more 
costly to use. However, as we explored in the case of Tr, their unbiasedness as an estimator of 
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the signal strength leads to power that is similarly large across very different alternatives. We 
consider this an attractive feature and conjecture that these statistics are minimax optimal, at 
least for a wide class of elliptical distributions. 

Another interesting assessment of independence is obtained by using the statistics Zf and Zd 
based on Bergsma and Dassios’ sign covariance t* and Hoeffding’s D, respectively. Both t* and 
D have the intriguing property of providing a consistent assessment of pairwise independence. 
For continuous observation, their expectations are zero if and only if the considered pair of 
random variables is independent. In the case of t*, th is also holds for disc rete variables. Under 
independence, t* and D are degenerate U-statistics (jNandv et al.l . 120161 ). The computational 


cost of their use in Zt* and Zd is comparable to that of Tr- However, determining the signal 
strength relevant for Zt* is more complicated than for T^. We are not aware of any literature 
that would offer a simple relationship between the expectation of t* or D and the scatter matrix 
of an elliptical distribution. 


7 . Discussion 

This paper treats nonparametric tests of independence using pairwise rank correlations or, 
more precisely, rank correlations that are also U-statistics. As reviewed in Section [51 the 
motivating examples are Kendall’s tan and Spearman’s rho but also Hoeffding’s D and Bergsma 
and Dassios’ sign covariance t*. The latter two correlations allow for consistent assessment of 
pairwise independence but form degenerate U-statistics. With a view towards alternatives in 
which dependence is “spread out over many coordinates”, we proposed statistics that are formed 
as sums of many pairwise dependency signals as explained in Section O In a high-dimensional 
regime in which both the number of variables m and the sample size n tend to infinity, we 
derived normal limits for the null distributions of these statistics (Section [4]). Our general 
framework gives results for U-statistic degeneracy of order up to two. Finally, we explored 
aspects of power theoretically and in simulations (Sections [5] and [6]) . 

Under the null hypothesis of independence, the m rank vectors are independent, each fol¬ 
lowing a uniform distribution on the symmetric group ©„. In small to moderate size problems, 
we may thus implement exact tests using Monte Carlo simulation to compute critical values. 
However, for large-scale problems and/or when using the computationally more involved t* or 
D, the asymptotic normal distributions we derived furnish accurate approximations and allow 
for great computational savings. 

Our study of power has focused on the case of Kendall’s tau. In a minimax paradigm and 
for Gaussian equicorrelation alternatives we showed rate-optimality for the test based on Tr, 
the unbiased estimator of the signal strength defined via (13.61) with kernel h = hr- It would 
be an interesting problem for future work to prove such rate-optimality more broadly, for more 
general alternatives as well as other kernels. In particular, for the kernel associated to Kendall’s 
tau, we conjectured in Section (5.II that rate-optimality holds for alternatives from a wide class 
of elliptical distributions. 
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Appendix A. Motivation of Schott’s statistic as a Rao score 


We show that, up to a rescaling by the squared sample size, the statistic Sr from (IE 3 is Rao’s 
score statistic in the multivariate normal setting. Let Xi,...,X„ be i.i.d. m-variate normal 
random vectors with mean vector and precision matrix K = Let X := ^ X^ be the 

sample mean vector, and let W = := ^ — X)(Xi — X)^ be the sample covariance 

matrix. The score test considers the gradient of the multivariate normal log-likelihood 

function 

= inlog|A|-i^(X,-/xfA(X,-/x) 

i 

at the maximum likelihood estimate {pq,Kq) under the null hypothesis Hq from (jl.lll . Specif¬ 
ically, the score test rejects Hq for large values of 


(A.l) 




where /(/x, K) is the Fisher-information matrix, /xq = X and Kq = diag(?u*^^^\ ..., 
Routine calculations show that 




= 0 

’ i9k(P9) 

1 

- 1 

M=Ao A=A'o 

M=AoA=Ao 


0 if p = g, 

if p < q- 


Moreover, for p < g and p' < q', 


(A.3) 


^Ao Ao 

'(I 
0 


{[Ko]pp[Ko]qq) if {p,q) = {p',q’), 

if ip,q) {p',q'), 


where means taking expectation under a multivariate normal distribution with mean 

/xq and precision matrix Kq. In light of (IA.2P and (IA.3I) . one obtains that the statistic from 
(lA.ll) is equal to times Schott’s statistic Sr from (11.31) . 


Appendix B. Technical lemmas 

The following lemma will be used to prove both Lemmas IB. 21 and IB. 31 below, as well as 
Lemmas 13.11 and 13.21 We make use of the following notion of multisets. For 1 < k < n, 
if are tuples in 'P{n,k), let the pair be the multiset associated with 

where fm ■ ^ N is the multiplicity function such that fm{i) is the number of 

occurrences of index i in the sets i^,..., i''. 


Lemma B.l. Let h : (N^)^ —> M. be a kernel that is symmetric in its k arguments and has 
order of degeneracy d under Hq . 

(i) Suppose i^,..., G 'Pin, k). If \ R| > 4fc — 2d, then 


Eq 


n '■ K"'"’) 


= 0 


for alll < p‘^ ^ q‘^ < m, uj = 1,... ,4. = 4k-2d, i/ienEo[n!,=i ^)] 

is nonzero only if |i“ (~l )l = for all uj = 1,... ,4, and in this case the mul¬ 
tiplicity function fm of the multiset fm) takes value either 1 or 2. 
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(ii) Suppose , i® S Vin, k). If \ > 8k — Ad, then 


En 


ll'-K"’"’) 


,UJ—1 


= 0 


for alll < p‘^ ^ q‘^ < m, uj = 1,... ,8. //|U®^ii‘^| = 8 fc—4(i, i/ienEo[n*=i ^ ^)] 

is nonzero only if |i“ fl )l = for all w = 1,... ,8, and in this case the mul¬ 

tiplicity function fm of the multiset (U®^]^i“, fm) takes value either 1 or 2 . 


Proof. We consider the first claim (i). Since are tuples in 'P{n,k), the multiplicity 

function fm of the multiset (U;^=ii“,/m) is such that X]iGU<‘ fmii) = ^k. If | > 

Ak — 2d, the cardinality of the set {i G U® : fm{i) = 1} must be greater than Ak — Ad, in 
which case there exists an w' so that c := |i“ fl < d. By symmetry, we may assume 

w' = 1 without loss of generality. 

Let j = n (Uu,^ii“) as sets. Then, conditional on \ we have that 

/i(R^f is independent of all other factors for w = 2,..., 4. Since h has order of 

degeneracy d under Hq, by (12.81) . Eo[/i(R|f '^ ^] = 0. Therefore, by the aforementioned 

conditional independence. 


En 


nK 


R 


ip“q'^) 






= 0 


as a function of Xj^ \ This in turn implies that Eo[ni!,=i ^)] = 0. 

The necessary condition for Eo[nt=i ^)] to be nonzero when | i^l = Ak — 2d 

can be argued similarly, and we omit the details. 

The proof of (ii) is analogous to that of (i). Again, we omit the details. □ 


The following three lemmas will be used to prove Lemma FE. II Recall the notational short¬ 
hand := ~ k-h A < p < q < m, defined in the proof of Theorem 14.II 


Lemma B.2. Suppose A < p,q,l,u < m are four distinct indices, and h is a kernel of order of 
degeneracy d satisfying Assumvtion \2.S\ under Hq. Then 


Eq 


h h h h 




Proof. Without loss of generality, we prove the result for {p, q, I, u) = (1, 2, 3,4). Note that for 
any four distinct indices 1 < pi,p 2 ,P 3 ,P 4 < rn, the antiranks R(pi)I(p 2 )^ j^(p 2 )|(p 3 )^ J{^(p 3 )I(p 4 ) 
are independent. Since C/L®), 1/(24) functions of R(^(I(®(, R(^(I(®1, R(^(I(^(, 

r(2 )I(4), respectively, on expansion. 


En 


f/(13) jy (23) Jj(14) Jj(24) 


= En 


2 


= En 


(^(13)) |'y(23)^ J'^(14)^ |'^(24)^ 


4 

l^h 


dj \ n'^ ) 


+ O , 


where the last equality follows from Lemma EH*). The proof is completed if we are able to 
show that 


(B.l) 


Eo 




(d^\‘ 

\n<^ ) 


+ O . 
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For S 'P{n, /c), w = 1,..., 8 , we define 

(B.2) p(i\...,i8)= 

\uj^l 

Then on expansion, 



(B.3) 


En 




-8 


E 

^V{n,k) 

l<uj<S 


Eo [P(i\...,i8)] . 


Each summand Eo[P(i^,..., i®)] on the right hand side of (IB.3I) depends on the multiset 
fm)- If I U® i“| > 8 fc - 4d, by LemmalEIlM)) Eo[P(i\ ..., i®)] = 0 . 

If I U^=i i‘^1 = 8 fc — 4d, by Lemma iBJl' gbd . for Eo[P(i^,..., i®)] to be non-zero it is necessary 
that |i“ n = d for all uj' = 1,...,8, in which case fm takes the value I or 2. 

Suppose this is true. Under Hq, conditioning on X[in(i 2 ui 5 ui 6 ) and is 

independent of all other multiplicative factors on the right hand side of (IB.21) . If intersects 
with the set U®^ 3 i“ \ at least one of fl (i^ U i® U i®) and fl (i^ U i® U i"') has cardinality 
less than d given that fm ^ 2, and then Assumption 12.21 yields that 


En 




( 13 ) \ 


x; 


( 1 ) 


'V'- 

iin(i2ui5ui6)’ ^iinCi^Ui^uH) 


( 2 ) 


= 0 . 


3 :5 ;7 


Hence, Eo[P(i^,..., i®)] = 0 by the aforementioned conditional independence. Similarly, i' 
can only intersect with U, i®, i®, respectively, to ensure that Eo[P(i®,..., i®)] does not equal zero. 
When this is the case, we have that |U H i“’+®| = rf for ui = 1,3, 5,7, and then the four sets 
i® n U, i® n U, i® n i®, f n i® are disjoint and Eo[P(i®,..., i®)] = 

As a result, when | = 8k — 4d, Eo[P(i®,..., i®)] is only nonzero with value (Cd foi 


8k — Ad 

8k — AdJ \2k — d, 2fc — d, 2k — d, 2k — d 


2k — d 
d 


^ ^2k - 2d\ 
k-d 


n\ 


(n-8fc-h4d)!((fc-d)!)8(d!)4 

choices of (i®,..., i®). This count is obtained as follows. First, pick 8k — Ad indices from the 
set {1,..., n}, and note that there are { 2 k-d 2 ]^-d 2 k-d 2 k-d) '"^^lys of partitioning the 8k — Ad 
indices into the four sets i® D i^, i® fl i®, i® fl i®, i®" fl i®. For each w S {1, 3, 5, 7}, there are 
choices for the d shared common indices in i™ Hi’"®'®, and there are ways of distributing 

the remaining 2k —2d indices to i“ and i™+®. Since the count of the summands Eo[P(i®,..., i®)] 
with I U® U| < 8 fc — 4d is of the order 0(n®®‘“®‘^“®), we find from (IB.3P that 


Eo 




-8 


(Qrnl 


^{n - 8k + Ady.iik - df-Yidiy 
'id& 


0 ( 


^ 8 k— 4 d—l 


)) 


^4d 


+ O (n-4''-®) . 


This concludes the proof of dHU). 


□ 
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Lemma B.3. Suppose 1 < p,q,l,u < m are four distinct indices, and h is a kernel of order of 
degeneracy d satisfying Assumvtion \2.S\ under Hq. Then 


Eo 






Proof. Again, without loss of generality, we prove the result for {p,q,l,u) = (1,2, 3,4). Given 
S V{n, 2k), w = 1,..., 4, we define 

(B.4) Q{i\ ...,i^) = h^ 


iyci“ 

|i“|=fc 


where := h h • By the definition from (13.51) . on expansion, 

Eo 


W/(13)^(23)^(14)^(24) 


1 


\2k) \^£V{n,2k), 

l<w<A 


(B.5) 

It now suffices to show that 
(B.6) Eo 


1 


({X))‘ 


E E«« 

V^^V{n,2k), 

1^=1,....4 |i“|=/c 


Am ,(23) ,(14) ,(24) 

ii.il P.T^ i3.l3 V.i^ 


, (13) ^ , (23) ^ , (14) ^ , (24) 


= 0 


whenever | > 8k — Ad, because then the right hand side of (jB.5p is of the order 

The value of a term 

(B.7) hg • .h (R(f)) h , 

depends on the multi set , fm), where fm '■ —>■ N is the multiplicity function 

with fmii) equal to the number of occurrences of i among the eight tuples 

(B. 8 ) e V{n,k), 

and I]iGuf,^ii“ fmii) = S>k. If I = | (i"^) U (i‘^ \ i"^)! >8k- Ad, by Lemma EUff), 

~2 ■ ^^ 3-3 • = 0. We are left with the case | = 8k — Ad. 

B I ^tj=i = 8k — Ad, then Lemma iBdT fil yields that for Eo[/i|^ to 

be non-zero, it is necessary (but not sufficient, as seen below) that each of the eight tuples in 
(IB. 8 |) intersects with the union of the other seven at exactly d elements, with fmii) E 2 for all 
i G . In particular, since is disjoint from \ it is the case that 

(B.9) |Pn(uL 2 i")l=rf- 

When conditioning on and X~j^^. 3 , it is seen that ^(Rij^^^) is independent of the other 

multiplicative factors on the right hand side of (IB.71) . Note that since fm is always less than 
or equal to 2, by (IB.91) one of (d if and (d must have cardinality less than d. Hence, by 
Assumption 12. 21 we have that 


Ef 




vfS) , 5 r(l) 
iini^’ Tini 


ni^ 


= 0 , 
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and the aforementioned conditional independence yields the claim from (|B.6I) . 


□ 


Lemma B.4. Suppose 1 < p,q,l,u < m are four distinct indices, and h is a kernel of order of 
degeneracy d satisfying Assumvtion \2.B under Hq. Then 


En 


7 -r(p 0 r r(90 r r(P“) 7-r(9“) 




Proof. The proof uses similar counting techniques to that of Lemmas IB.21 and IB.31 and is only 
simpler. We only sketch the argument. Without loss of generality, let {p,q,l,u) = (1,2, 3,4). 
On expansion, by defining B{\^, • ■ •, i'^) ■= h h h h , 


(B.IO) 


En 


j^(13)j^(23)j^(14)j^(24) 


^ Eo[R(i\...,i4)]. 


i‘^G'P(n,/c) 

l<u;<4 


By Lemma IBTT O. Eq , i^)] = 0 if | i‘^| > Ak — 2d. When | 0^,^^ = 4:k — 2d, one 

can also show Eq [^*(1^, ■. •, i^)] = 0 by using Lemma llj.ir O and the property of the kernel given 
by Assumption l2.2l Hence, there are at most summands on the right hand side of 

(iBTOll and we conclude that Eq 


.,4k—2d—l\ _ — Q 


= 0(r 




Appendix C. Proofs for Section [5] 

Proof of Lemma 1X71 Claim (i) holds because the independence of ..., implies that 
the rank vectors R*^^\..., R^™) are i.i.d. For assertion (ii), note that, by the permutation 
sym metry of q in its n arguments, is a function of the antirank of in relation to 


X*^P^ ( Haiek et al.l . Il999l . p. 63). These antiranks, which we denote by are uniformly 


distributed on ©„ for any fixed choice of X*^p\ which yields the independence of ^nd 
Similarly, is independent (Of course, X^^*^ and together determine Claim 

(Hi) holds since the independence of X^^), • ■ •, X^™) implies that the m — 1 vectors of antiranks 
p,(OI(p) for p I are mutually independent. Finally, the pairwise independence stated in {iv) 
is implied by the independence of X^^\ ..., X^™) and (Hi). □ 


Appendix D. Proofs for Section [3] 

Proof of Lemma \3.1\ It remains to prove claim (Hi) about the fourth moment of when 
the kernel h has its order of degeneracy d equal to 1 or 2 under Hq . Without loss of generality, 
we can assume (p, q) = (1, 2). The fourth moment can be written as 


(D.l) 


En 




-4 


E 




The value of each summand Eq in (ID. II) depends on the multiset fm) 

with 


(D.2) ^ Uii) = 4fc; 

we use the multiset notation introduced in the first paragraph of Appendix |B] 

By Lemma IB.IH I. we have Eo[nt=i = 0 if > 4fc — 2d. If < 

4fc — 2d, there are at most choices for the set Since h is bounded, it thus 
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holds that 



E ^0 

|U^=ii“|<4fc-2d 


n '■ K’) 




Therefore, to complete the proof, it suffices to show that 


(D.3) 


0" i: 

1- 

1_ 

^ ^ :d^V{n,k) 

|u{,^ii"|=4fc-2d 

.u;=l 


3fc^Ci) 3) jf d=l, 

(2)'^((C2)'+V)+0(n-5) if d=2. 


By Lemma [B. If f I. when | i“| = 4fc — 2d, a summand ]Eo[nhi on the left 

hand side of (jD.Sp is non-zero only if 

(D.4) |i“ n )| = d for all w = 1,..., 4. 

For both d = 1 and d = 2, (|D.4[1 is true when the set {1,2, 3,4} can be partitioned into two 
disjoint sets fli and 122 such that 


(D.5) 


|12i| = | 122 | = 2 and | = | = d. 


in which case (U^jeOii^) FI (Uc^gnai^) = 0 and, by independence. 


(D.6) 


Eq 


n ('■(<’)) 



n ('■ {<’)) 


{&■ 


Next, we count how many summands on the left hand side of (ID.3P have their indices i^,..., 
satisfying the constellation in (ID.51) . There are ( 4 ;j” 2 d) choices for the set Then there 

are \ i^ 2 k^^d) partitions of into two subsets of equal cardinality. Each of these subsets 

with cardinality 2fc — d is to be split into two subsets that have d elements in common. We 
have choices for this common element, and there are ways of partitioning the 

remaining elements to form the two subsets. In the above counting process, no ordering is taken 
into account. Hence, the number of summands in (ID.ip whose indices i^,..., i"* satisfy (ID.5I1 is 


(D.7) 4! 


" ) 

Ak-2dJ 


1 /4fc - 2d\ 

2 2A: - d j 


/2k-d\ l/2fc-2d\1^ 

V d j2l, k-d ) 


{n 


3n! 


4fc-h2d)! d!((/c-d)!) 



When d = 1, for any four tuples i^,..., G Vin, k) with | i“| = 4fc — 2d = 4fc — 2, (ID. 41) 

is only satisfied when they can be described by the constellation in (jD.5p . Since 


(D.8) 



3n! 


4fc-h2d)! dl{{k-d)lY 


k^ Sidiy 

d) 




by (ID. 61) and (ID. 71) . we have proved the equality in (ID. 31) for d = 1. 

When d = 2, in addition to (ID. 51) . there is another constellation for i^,..., G 'P{n, k) that 
satisfies the condition in (jD.4l) subject to | ufj^i = 4fc — 2d = 4fc — 4. If, up to relabeling of 
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superscripts {1,..., 4} for , i'*, the multiset fm) is such that 

(D.9) = 1 and 

'2 if i belongs to any one of D fl fl or i'* fl 


(D.IO) 


fm(i) — 


1 otherwise, 
then (ID.4I) is satisfied with 


(D.ll) 


En 


n ('•(<’)) 


,UJ — 1 


= V 


We will conclude the proof of (ID.31) for d = 2 by showing there are 

4fc- 8 


(D.12) 3-4!- 


n 

4fc-4 


4fc - 4’ 
4 


fc — 2,fc — 2,/c — 2,fc — 2 


3n! 


(n-4fc + 4)!((/c-2)!)4 


choices of , i"* that satisfy (ID. 91) and (ID.101) . possibly after relabeling of their superscripts. 


If so, since (”) = ( 2 ) + 0(n '^), combining (jP.SD with the summand 

values (jP.Bp and (ID.111) . we have shown that for d = 2, the left hand side of (ID.31) equals 


3(2!)' 


{(d)" + id) + o(..-»). 


It remains to show the count in (ID.121) . First, we count how many such constellations there 
are without any relabeling of superscripts. Given each of the choice for the set 

there are ways of picking the disjoint singleton sets (i^ D i^), (i^ D i^), (i^ D and 

(i'^ n i^). Now there are (^_2 ^,(^2 fc -2 k- 2 ) to partition the remaining 4fc — 8 elements of 

the set into the four sets \ (i^ U i'^), \ (i^ U i^) , \ (i^ U i"*) and i"* \ (i^ U i^). Hence, 

there are 

n \ /4fc — 4\ / 4fc — 8 

4k-A)[ 4 )[k-2,k-2,k-2,k-2j 

choices of ..., that satisfy (ID.9() and (|D.10I) without having to relabel their superscripts. 
To obtain the factor of 3 in (ID.121) . we note that the constellation of i^,..., described by 
(ID.91) and (ID.101) is such that intersects with and Alternatively, can intersect with 
and i^, or and i^, to give a constellation satisfying (ID.Op and (ID.lOp after relabeling of index 
superscripts. □ 


Proof of Lemma \S.‘A As in the proof of Lemma 13.11 without loss of generality, we assume 
{P, q) = (1, 2). For any given i, j G Tin, 2k), 


(D.13) Eo h'^ 


1 : y:E4^R<“')^R<y),.(Ry>),.(R™; 


1 Cl J"CJ 

I' 1=^ 


Since i^, i \ and j \ are tuples in V{n,k), if |i D j| < 2d, or equivalently |i U j| > 

4fc — 2d, by Lemma rB.ll H. all summands on the right hand side of (ID.131) equal zero, and thus 

= 0 . 


Eo [h^ 
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Suppose |i n j| = 2d. If G 'P{n, k) are such that C i and C j, we define = i \ 
and = j \ to simplify notation. If 

(D.14) |iinj^|=d and |i^nj^| = d, 


then the necessary condition in Lemma [6.11 ^1 is satisfied. Since U and P U are disjoint, 
independence gives 


(D.15) 


En 


h (4'")) h (R|f)) h (rJ,'")) h (rJ,'"))] = iC^) 


Similarly, if 


(D.16) 


|i^Llj^|=d and |i^nj^| = fi. 


then (ID.ISP holds too. 

Now we give the count for how many combinations of and satisfy (ID.14p . Since |inj| = 
2d, there are choices for the set D j^, which determines D j^. For each such choice, 
there are then choices for each of \ (i^ H j^) and \ (i^ H j^), which determine P and 

j^. Hence, there are {^d)Ck-d^) choices of satisfying (ID.141) . Analogously, there are 

also choices of satisfying (ID.161) . In total, there are 


(D.17) 



/2fc-2dy 

\ k-d) 


summands in (ID.131) with the value (C^)^- 

If d = 1, then no constellations for and other than the ones given by (ID.141) and (ID.161) 
yield a non-zero value for Eo[/i(R[i^^)/i(R| 2 ^^)h(R|^^^)/i(R| 2 ^^)]. Therefore, we deduce from 
(ID.131) that, for d = 1, 





It remains to prove the formula for when d = 2. In this case, besides (ID.141) and (ID.161) . 
there is one other constellation for i^, i^, j^, so that the necessary condition in Lemma [B.l H I 
is satisfied. If the multiset (i^ U U U j^, /m) is such that 


(D.18) 

|i'nji| = 

j^n-Z\ = \-Z 

nf| = |f nr| = 

1 

and 


(D.19) 

I 2 if i belongs to any one of 
fmii) = < 

nj 

or j^ni^ 

1 1 otherwise 





then 







(D.20) 

Eo 

h (r[P^) h 

(r[2 '')) h (rJi'")] 

h{ 

4“’)1 



Now we count: For a fixed pair (i,j) such that |inj| = 4, there are 4! choices for the singletons 
D j^, (~) in and fl i^. Given each such choice for these singletons, there are 

choices for each one of and j^, hence there are 


4 ! 


2fc-4 

k-2 


2 
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summands on the right hand side of (ID.13I) with the value 77 ^. Combining with the count (ID.171) 
for summands with the value (Cd conclude that if d = 2 then 


^2d 



/2k - 2d\ 
\ k-d ) 


(Cd^r+4! 


2fc-4 

k-2 



12 


/2fc-4\ Y2fc\ 
\k-2) [kj 


-2 


+ 277 "] . 


□ 


Appendix E. Proofs for Section H] 

Here, we prove Lemmas IE. 1 1 and IE .2 1 that were used in the proof of Theorem lE 

Lemma E.l. The martingale differences from satisfy the convergences 

2 m \ ^ 


(E.l) 

(E.2) 

(E.3) 


En 


En 


En 


2 \ 

-^Eo 

^ 1^2 / 

V 1=2 / 


0 


when d = 1, and the Lf convergences 

4 m 


(E.4) 

(E.5) 

(E. 6 ) 

when d = 2. 


En 


En 


£Eo[(Df,)2|j-„,,_i] - 4 Q) {(C2^f+ 677'^}j 

4 ™ /l,\ 4 \ ' 

^^Eo[(dX,)^|J-„,_i] -4Q {(C2)' +V}j 

^gEo 


0 , 


0 , 


Proof. When d = 1, for the convergences in dEII), (1121) and (Ie2]) . it is sufficient to show 
that, as m, n — > oo, 


^2 ^ 


.,2 ^ 


(E.7) 


^Eo[(Df,)2], —Y^EoiiDl^r] k\Cf) 
1^2 
m 

EEo 


1^2 


1=2 


2 


and 
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Varo 


^0, 


1^2 


Varo 

2 ^ 

^^Eo[(i^^,)2|J-„,_i] 

1^2 

^0, 

Varo 

m 

^^Eo[(i^f,)Vn.;-l] 

Z=2 

-5- 0. 


When d = 2, for the convergences in (jE.4l) and (jE.5l) . it suffices to show that, as m, n —> oo, 

m 4 ^ 4 

4r) {{C^f + 6v^}, 

1^2 ^ 

^ 4 /jL\ 4 

(E.9) ^ J {(C2f + 2r7'‘}, 

m 2 /iL\ 2 

^ ( 1 ) C2^ «nd 

/=2 


(E.IO) 


Varo 


^0, 


1^2 


Varo 

4 ^ 

EEEo[(i^^;)^l-^«4-i] 

^0, 


1^2 


Varo 

2 ^ 

Z-2 

-0. 


We will first show the convergences of expectations in (lEJl) and M . Suppose d = 1 or 2 is 
the order of degeneracy of h under Hq. By Lemma I2.ir il and (hi), the terms that are 
summed to form D^i are i.i.d. such that 


,2d 


^2d ^-1 

;;;yEvaro 


rEo[(d?: 




p=i 


u, 


(pi) 


,2d 


-{I - l)Varo 


U, 


( 12 ) 


ft follows that 

(E.ll) 


^ 1=2 


2m 


Similarly, by Lemma HHi) and (hi), we have that 

-,2d _ X)77,2'i 


(E.12) 

(E.13) 




Z=2 


2m 


Yarn 


W, 


U, 


( 12 ) 


( 12 ) 


2-/ L J 

1=2 


2m 


U, 


( 12 ) 


and 
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By Lemma and {in). 


Yarn 


U, 


( 12 ) 


— lEn 






(E.14) 


^f^^^+0(n-3) if d=l, 

1;f( 2)"{(C2)' + 6^'‘} + 0(’^-®) if d=2. 

Since is a rank-based U-statistic with the induced kernel function of degree 2k, via 

Lemma 13.21 Lemma m^) applies to give 


Yarn 


W, 


( 12 ) 


= En 


(E.15) 


H^+0(u-3) 




2dJ n 


2d 


if d = 1, 


(E.16) Yaro 






t;?© {(C2)' + V} + 0(«-") if d = 2. 

Moreover, Lemma EHi) yields that 

-I- O (u“^) if d = 1, 

■(2)%0(n-3) jf ^^2. 

Plugging (jE.14|) . (jE.151) and (jE.16l) into (IE.11|) , (IE.12|) and (IE.131) for d = 1 and d = 2, 
respectively, and taking the limit, we obtain the convergences in (IE.71) and (IE.91) . 

Next, we show that the variances in (IE.8|) and (IE.101) converges to zero. For d S {1,2}, write 

2d ^ 


Z=2 

^2d 


( m 1 — 1 

’ / V 9 


m 



EE^o 

piPp 

^ n,l — l 

+ 2E E Eo 

tt(.P^) rr(^0 

‘E n,l—l 

1^2 p^l 



Z— 3 l<p<q<l 




and notice that the first sum on the right-hand side is a constant because, by Lemma nn ii), 
Eo =Eo X(P) =Eo 

-!> 0, it suffices to show 


We observe that in order to show Yaro 


(E.17) —Yaro 

mr 


By exactly analogous arguments, it suffices to show 

(E.18) ^Yaro 

mr 


■YZ2M{Did^\d^n,l-l] 


m 



YE®" 

tt{p^) tt{q0 

^ n,Z—1 

/—3 l<p<q<l 




0 . 


m 



E E Eo 

h h 

^ n.,Z—1 

Z— 3 l<p<q<l 




0 and 


(E.19) 


^2d 


fYaro 


771 



E E Eo 

tt{p0 tj{Q^) 

^ n,,Z—1 

Z—3 l<p<<j<Z 




in order to prove Yaro ^ Ya= 2 ^o[iDni)'^\iFn,i-i] 


Yaro 




0. 
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We first prove (IE.171) . For p < q < I, consider 


:= Eo 


Tj{P^)Tr{l^) 




= Eo 


Tfipl) Tt{1^) 


x(p),x(«) 


which is a function of and alone. Since 


^ /(r(pO^ ^ ^ ^ 


for a function / : (R^)^ — > R that is permutation symmetric in its k arguments, and since 
the rank vectors R(‘^\ R^^^ are independent and uniformly distributed on ©„ under 

Hq, the conditional expectation is in fact a function of the tuple (Ri^®\ ... ,Rn^'^^) that 

is symmetric in its n arguments. Therefore, Lemma 12.11 applies to the collection of C^’P’k) ^ 
1 < P 7 ^ 9 < w. The variance in (IE.171) is thus 


Varo 


m 



Y (7(p«) 

= Y (™ “ 9)^Varo 

(j{pq) 

/—3 l<p<q<l 

l<p<g<m— 1 



= —m(m — 2)(m — l)^Varo 


C{12) 


Now under the asymptotic regime to, n —oo, (|E.17p holds if Varo is of order 0{n ^). 

Suppose 2 < I < u < m. Then, by definition. 




rr(lO rr(20 


X(l) y(2) 


= En 




X(l) y(2) 


It follows that 
Eq 




(E.20) 


= Eq 
= Eo 
= Eo 


Eo 
Eo 
(C(12)) 




rr(lO 


x(i) x(2) 


x(i),x(2) 

Eq 




x(i) x(2) 


where (IE.201) follows from independence of X(*) and X(“). Applying Lemma [6.2 1 we deduce that 
)Eo[(C'(i2))2j jg Qf order (7(n“^'^“i). This concludes the proof as an application of Lemma l2.11 iM') 
shows that (7(i2) has mean zero, and thus Varo[(7(i2)] = Eo[(( 7 (i 2 )) 2 j^ 

The proof of (IE.181) and (|E.19I) proceeds line by line as the proof of (IE.171) . where for all 
l<p^q<mwe replace by or , define alternatively as 


(7(P‘?) := Eo 


^n,l — l 


or (7(P‘?) := Eo 


TjiP^) Tj(q0 


‘A nA—l 


and apply Lemma [6.31 or Lemma [6.41 We omit the details. 


□ 


Lemma E.2. For d = \ or 2, the martingale differences from satisfy the Lyapunov 

conditions 


(E.21) 

(E.22) 


,4d ™ 


,4d ™ 


■^Eo , — ^Eo 0 and 


1^2 


1^2 


,2d 


■^Eo ^ 0 


as m, n 


00 . 
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Proof. Since T,Z 2 '^o[iD^i)'^\Tn.i-i], E ™2 and ^“ 2^0 [{D^i)‘^\Tn,i-i] are 

nonnegative random variables, it suffices to show that all three expectations converge to zero, 
that is, 


,4d 


,4d 


1^2 


.,2d 


We first show it for ^ E/II 2 Lemmaand (Hi), -Df; is a sum of ^ — 1 

centered i.i.d. random variables. On expansion, we have that 


i-i 


En 




p^l 

= {l- l)Eo 


l<p<q<l 


It follows that 


(E.23) 


n 


TO 


■E^o (D^) 


1^2 


n 


4d 


En 




Now recall from (IE.14I) that the variance of is of order 0{n Furthermore, 


En 


= ^o\{Ky - 


= Eq 
= Eq 




is of order 0{n by Lemma [XlT nb Substituting these into (|E.23p we conclude that 


4d 

E [{D^i)^] = 0(m-^) —^ 0 as 


m, n —> 00 . 


1^2 


The proof for ^E/Il 2 ®o [(^n/)^] ^E/Il 2®'0 is similar. On expansion, we 

have 


1^2 

2d 

(E-25) ^EEo[(i?f0' 


n 


4d 


1^2 


2d 


n 


= ^ „ Eo 




En 




6 


U, 


(i2)y 


TO 

3 

TO 

3 


En 


Eq 




by Lemma o:*) and {in). By Lemmas 13.ll wl and 13.21 since has order of degeneracy 2d, 
Eo[(W^i^^L] 3'iid Eo[(1eE^)^] are of order 0{n~^‘^) and 0(n“^‘^) respectively. Another ap¬ 
plication of Lemma IXTI iO gives that Eq = 0{n~‘^^) and Eg = 0{n~‘^). 

On substituting these into (IE.2411 and (IE.251) we get that both ^ E™ 2 Eo and 

^EE2Eo {D^iT are of order 0{m and converge to 0 as to, n —> 00 . □ 


Proof of Corollary \4.‘d\ It suffices to show that ^{Sp — Sp) = Op(l), in which case the corollary 
is implied by the fact that ^Sp — > N{0, 1) as given in Theorem 14.11 and the value of cf'”'* in 
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Tabled! By the decomposition in (|2.5p . the statistic Sp from may be written as 


5p= E 

l<p<q'<m 


n — 2 


.(p?) 


r(P9) 




+ 1 n + 1 

Expanding the square in the summands on the right-hand side, we obtain that 


(E.26) Sp = 


n — 2 
n + \ 


(» + !)“ 


n — 2 
n + \ 


Hp2 + 


(n + l) 


2 Mr 2 - Mp2 


recall the definition of St and Sp. Note that since S'p, St and Sp have mean zero, it holds that 


Mpr ■— 1^0 


^(pq) .j-iPl) 


{n + 1)^ 


6(n- 2) 


flp2 


n — 2 
n + l 


flp2 


{n + 1)^ 


and hence (IE.261) can be rewritten as 
2 


Sp = 


n — 2 
n -I- 1 


5, 


9 


(n -I- 1)2 (n + 1)2 


p{pQ)^{p<i) _ 


l<p<g<m 


m\ 

2 / 




Since St = Op{l) by Theorem 14.11 in order to prove the assertion that ^{Sp — Sp) = 

Op(l), it thus suffices to show that 


6n(n — 2) 
m(n -I- 1)2 


pipi)pipi) 


m\ 

2 


Mpr 


0 . 


l<p<q'<m 

We show this by proving convergence to zero in , for which we need to argue that 


(E.27) 


36n^(n — 2)' 
m?{n + 1)^ 


■En 


y^ p{p<i)pip‘}) _ 


l<p<(7<m 


Mpr 


0 . 


Note that Lemma [2.11 applies to the collection of statistics plP'J) r^pt) _ By Lemma Enji) and 
(iw), the term in (|E.27|1 equals 

18n^(n — 2)^(m — 1) 


(E.28) 


(n -I- l)'^m 




Since ^ ” ++i)irn —" ~ ^(1) ^ convergence from (IE.271) it remains to 

show that 

2" 


Varn 


p(12).^(12) 


= En 






0 . 


However, using the inequality 2xy < {x^ +2/^), we see that 


0 < Varo 


p(12).^(12) 


< Eg 




which is of order 0{n ^) by Lemma lOT zzl. 


□ 
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Appendix F. Proofs for Section O 

Unlike in other sections, here all the rank-based U-statistics will be treated as functions of 
the original data ..., in our presentation. 


Proof of Theorem A5.1[ In this proof, all operators E[-], Cov[-], Var(-), P(-) are with respect to 
a general distribution in 'Dm- 

{i): Let Ur be the (™)-vector Then Ur is a U-statistic taking values in 

with the (™)-dimensional vector-valued kernel 

hr(X,,Xj) = f/ir(x|^'^\x'^'^M 

V -^ / l<p<.q<m 

of degree k = 2. Here, i ^ j index any pair of samples. Note that Sr = ||Ur||i — and 

based on Theorem id.ll we have under the regime ^ —>■ 7 that the test (j)a{Sr) rejects Hq when 

||Ur ||2 > = O(-yn). Recall that fij- = value of Ci'^ is 

given in Table [TJ By the triangle inequality 

||Ur||2> ||0r||2-||Ur-0r||2, 


it suffices to show that as n —> 00 , uniformly over Dm, 


P(||Ur-0r||2>Uv^)<l-/3 


for some constant U > 0 that only depends on /3 an d 7 . For any pair i ^ j, let hr,i(Xi) = 
E[hr (Xi, Xj)|Xi] and define the canonical functions (IBorovskikhl . 1 19961 . p. 8 ) 


(F.l) gi(X,) :=hr,i(X,)-0, 

(F.2) g2(X„X,) :=hr(X„X,)-hr,l(X,)-hr.l(X,) + 0. 


Since the Kendall kernel hr is bounded, HgilH and ||g 2||2 are both less than (™)M for a certain 
constant M > 0 that does not dep e nd on n and m. Suppose d £ {1, 2} is the order of degeneracy 
for the kernel hr. By Borovskikh ( 19961 . Corollary 8.1.7), we have that for any t > 0 , 

U(||Ur-0r||2 >t) < Uiexp|-C 2 n(^J^ ' |, 


where C\,C 2 > 0 are universal constants and = M(™) X]c=o ^ = M(™) ■ Using 

the fact that — i-n-^ letting t = C^/n for some C > 0, we get 

(F.3) P(||U. - e.lh > CV^) < C. exp {-C, } . 

for large enough u as ^ 7 . The proof for {i) is completed by picking C large so that the 

right hand side of (IF.31) is less than 1 — /3 as ^ —)> 7 G (0, 00). 

(ii): Recall that E[Tr] = ||0||2 j and the test (fiTr) rejects Hq when Tr > ^zi_o,. In what 
follows we let ||0||2 = C,/n for an arbitrary fixed constant C > 0. By Chebyshev’s inequality, 
for large enough n under the regime ^ —> 7 , 


(F.4) 1 - E[<^(rr)] = P ( Tr - ||0r||i < ^Zl-a - ||0.||^ 


<P |Tr-||0r||^| > 


4m 

9n 


Zi_„ - || 0 r ||2 


Var(Tr) 


- (^Zl_„-||0r||i)2’ 
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where the first inequality is true when C is taken large enough. We will finish the proof by 
showing that as ^ —>■ 7 , the rightmost term of (jEl is less than 1 — /3 when C is chosen large 
enough. To that end we will study the variance of the statistic T^. Note that 

is a U-statistic with the kernel of degree 4 

h^(x„x„x,,xo := E 

l<p<q'<m 

where is the function h}^ defined in (lO) when h is the Kendall kernel hr- Here it is 
important to note that the kernel al so depends on the number of variables m since it is a 
sum of (™) terms. By Lemma 5.2.lA in Serfiing ( IQSOl l. the variance of Tr satisfies 


(F.5) Var(r^) := 


n 


-1 4 


E 

C=1 


n — 4 
4 — c 


c< 


16Cl 




for a constant C > 0 that does not depend on C; recall definition (12.61) for the kernel h = . 

Claim. C,i'' < C^nm{m — 1) 


Proof of the claim. For seven distinct sample indices ii,..., iy € {1,..., n}, 
cf" = E[h^ (X,,,..., (X,„ ..., X,,)] - ||0,||4 

l<p<q'<m 
l<p' <.q' Km 

IKpKqKm 
IKp' Kq' Km 

where the last equality is true by the definition of and independence. Since \hr\ < 1, it 

is true that < 2. This in turns implies that is less than the quadratic form 

20!^J^m^0^, where is the (™)-by-(™) semi-positive definite matrix with all I’s. Since the 

largest eigenvalue of is (™), given that ||0 t||2 = C^/n, 

2Q'^3 < C^nm{m — 1), 

and the claim is proved. □ 


Returning to the other quantities in (IF.51) . since \h^\ < 1, it is easy to show that each of 

, Ck Ck is bounded by 2(™)^. Hence, under the regime ^ —>■ 7 , together with the 
claim above, (IF.51) gives that for all large n, 

(F. 6 ) Var(T^) < + 3-f‘^C). 


Recalling that ||0r||^ = Cy/n, and applying 


(F.7) 


l-E[cl){Tr)] < 


to (jEl), we get that 
771^(16(7^ -I- 37^(7) 

C'4n2-C2|mzi_« + ^z2_„ 


for all large n. Since C is arbitrary, by choosing it large enough the right hand side of (El 
can be made less than 1 — /3 as — —> 7 . □ 
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The following lemma is needed for the proof of Theorems 15.21 and 15.31 

Lemma F.l. Let J = [0,1 — e] C M for some small fixed e > 0. For fixed positive integers 
Cl,..., Cf, such that suppose X = ..., ^ iV(0, E) is a c-variate normal 

random vector with an invertible block diagonal covariance matrix 

E = E(p) = 

Bb{p)_ 

where each Bi{p) is a Ci-by-Ci matrix with 1 ’s on the diagonal and all off-diagonal entries equal 
to some p G I. If H —> K is o bounded function such that E[id(X)] = 0 when p = 0, then 
there exists a constant C = C{H, e) > 0 such that |E[id(X)]| < Cp for all p G I. 

Proof. For all p G I, the matrix Ti(p) is invertible and the precision matrix T,~^{p) is a smooth 
function of p. Hence, the set of distribution s N(0,T,(p)) forms a curved exponential family. 
By standard results on exponential families ( Lehmann and Casellal . 19981 . Theorem 5.8), the 
expectation E[id(X)] is a continuous function of p that is differentiable on (0,1 —e). The lemma 
is thus implied by the mean value theorem and the compactness of [ 0,1 — e]. □ 

Proof of Theorem \5.‘A The value of T^. depends only on the rank vectors ..., With¬ 

out loss of generality, we may thus assume that each is centered with unit variance, i.e., 
(X^^),..., X^™)) ^ X(0, i?), where R = is a correlation matrix, with I’s on the diagonal. 

It suffices to prove the result under the restriction that 9 can only take values in a closed 
interval [0,1 — e], for some fixed small e > 0. In other words, in the statement of the theorem, 
replace the set of distributions A/"m(|| 0 r ||2 > = 9) under the infimum by the subset 

(F. 8 ) {N GJ\f„,{\\erh> C-,9iP^'> =9): 9 G [0,1-e]}. 

To see that this restriction can be made, note that 9 > 1 — e implies that || 0 r ||2 > 

0{m). Since 0{m) > 0{^/n) asymptotically under the regime ^ —> 7 , by Theorem 15.ll fH. 
nothing is lost by ignoring the normal distributions in Mm (II0t II 2 > C: = 9) with 6 » > 1-e. 

In addition, for all p q,hy we have the classical result ( Kruskal . 19581 . p.823), 

when 9^'^'^ =9. As a consequ ence, for the covariance m atrix R to be positive definite it must 
be that 9 > — arcsin[^^^^] ( Horn and Johnson . 20131 . Theorem 7.2.5). Hence, as n and m 
grow, it can be seen that || 0 r ||2 < when 9 lies in the interval arcsin[^^^], 0). As 

such, by taking the constant C to be larger than 1 / ^/2 when necessary, it suffices to consider 
the subset of distributions (IF. 81) under the infimum. 

In what follows, the operators E[-],Var[-] and Cov[-] are all with respect to an to- variate 
normal distribution for (XF),... ^X^"*)) in (IF. 81) . Recall from (IF. 51) that 


p(pq) = p = sin 


-1 4 

E 

C=1 


n — 4 
4 — c 


Cc” 


Var(r^) := 

Our proof now begins with the Chebyshev inequality from (El: 

Var(r,) ^ 


(F.9) i-E[<^„(ro]< 


11 )^ 


< 


(|^ Zi _„-||0 " 2^2 - O -,. ^2 Sm 


(^zi_„)2-|^zi_„||0.|li + ||0 
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where the last inequality is true since ( 4 ) ^ (^) ( 4 _g) < Bn for a constant B > 0. To finish 
the proof, it suffices to show that for each c = 1,..., 4, a constant C'c(a, /?, 7 ) >0 exists such 
that for large enough n (depending on Cc), 


(F.IO) 


BCc-n- 


. 9n 9„ ^i-a||^T||2 w ||0r||2 




< 




whenever || 0 r ||2 > C'c- We may then take C = maxc=i ^..._4 Cc- 
For notational convenience, we define 

/ij := ^ nh ^..., Xlf (Xjf 

l<p<g<m 
l<p'<.q' <m 

for any tuples i = (ii,..., i^), j = (ji, ■ ■ ■, jd) S V{n, 4) such that |i H j| = c. Then 




(FJl) 

Since the ratio 


Cc^ =/ij-||0.l 

BWQrWi 


is bounded for all values of || 0 r|| 2 j we have for each c = 1,..., 4 that 


5110 


Il4^—c 


t\\2' 




^ zi _ J | 0.||2 + ||0 


T||2 


0 


as ^ —j> 7 . Upon substituting (IF.llI) into (IF.10|) . we see that the proof is finished if the below 
claim is shown to be true. □ 


Claim. Under = 9, there exists for each c = 1,..., 4, a constant Cc{oc, /3, 7 ) > 0 such that 
for large enough n (depending on Cc), 


(F.12) 


whenever ||0r||2 = 9 


5/ijn- 


(^^l-a)^-^^l-a||0, 

4g)>c,. 


le^lll 


< 


1-/3 

5 


Proof of the claim when c = 1. Using independence, we find that for any four distinct indices 
1 < i,j,k < n, 

(F.13) /ij= ^ 9^E[hr{X.<f^\l<if‘‘^)hriyi’f'^'\x^f‘‘'^)] + 

l<p<g<m 
l<p' <.q' <m 
\{p,q}n{p' ,q'}\>l 

'• -V-' 

( 1 ) 

l<p<g<m 
l<p' <.q' <m 
\{p,q}n{p' ,q'}\=0 

' -V-' 

( 2 ) 

Since \hr\ < 1, the term (1) is bounded in absolute value by —(™) = 0{m)\\Qr\\2- 

To bound (2), note that when |{p, < 7 } fl {p', q'}\ = 0, the expectation term 

(F.14) E[hr{x’f'^\x^f^^)hr{X^''^'\x’'P'^'^)] 
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equals 0 when 0 = 0 due to the independence of and {X^^ \ ^}- Moreover, 

for 0 7 ^ 0 , the pairs , X^^ ^, X^^ ^} jointly follow a 8 -variate normal distribution 

with block diagonal covariance matrix, where each block has I’s on the diagonal and all its 
off-diagonal entries equal to p = sin( 7 r 0 / 2 ). By Lemma FF. 11 the expectation (IF.14|) is bounded 
in absolute value, up to a multiplying constant, by 9, and hence ( 2 ) bounded by 0{rri^)9^ = 
O(m)|| 0 i -||2 in absolute value. 

Using the above bounds for (1) and (2) we get that the left hand side of (IF.121) is less than 

^(lie.lli + lie.lli) 

Under the regime ^ —> 7 , we see that the expression in the above display can be made less 
than when || 0 r ||2 and n are large enough. □ 

Proof of the claim when c = 2. Again, using independence, we find that 


1 \ 2 


(F.15) 9/ij = ^4 (e 

'-.-' 

( 1 ) 

V V 

( 2 ) 

'-V-' 

(3) 

^ 20E[/i,(x|P'«'\ X^P'«'Vr(x|P«\ x5^«))h,(xf«'\ xl^’'«'))], 

'- V -' 

(4) 

where each summation is over all pairs 1 < p < q < m and 1 < p' < q' < m, and i, j, k, I are 
any 4 distinct indices in {1,..., n}. We now derive bounds for the absolute values of the terms 
(1),(2),(3),(4). 


Term (1); We claim that |(1)| < 0 (to^)(1 - 1 - || 0 r|| 2 )- To show this, observe that (1) equals 
(F.16) ^ 4 (E[/i,(x|p«\x5^«V,(x(^«\x^^«))])2+ 

l<p<g<m 

^ 4(E[h,(x|^«\x5^«))h,(xK«'\x(P'«'^)])2. 

|{p,g}n{p',g'}|=0 

l<p<g<m 

l<p^<g^<m 

Since |/i^| < 1, the hrst sum in (IF.161) is bounded by a term of order O(to^). Considering the 
second sum, an expectation 

(F.17) E[h,(x|P«\x5^’«))h,(x(p'«'\xi,P'«'))] 

with {p,q} 7 ^ {p',q'} equals 0 when 0 = 0 by independence. Moreover, X,-^'^\ X-^ \ 

and X^^ ^ ^ jointly follow an 8 -variate normal distribution with block diagonal covariance matrix 
as in Lemma fF.ll By that lemma and the fact that p = sin( 7 r 6 (/ 2 ), we obtain that (|F.17I) is 
bounded in absolute value by 9 times a constant, hence the second sum in (|F.16I) is bounded 




















37 


in absolute value by a term equal to O(m^)||0T-||i- Gathering the bounds for the two sums in 
(IF.161) gives the claimed bound for the absolute value of term (1). 

Term (2): We claim that |(2)| < 0{m'^)\\Qr\\2- Indeed, since \hr\ < 1, it is easy show that 
( 2 ) is bounded in absolute value by = (™)|| 0 t ||2 = 0 (rn^)\\<dr\\ 2 - 

Terms (3) and (4).- We claim that |(3)|, |(4)| < O(to^)(|| 0 t -||2 + || 0 t|| 2 )- We give details for 
the proof of bound for |(3)|. The bound for (4) is analogous. We write (3) as 


|{p,9}n{p'.9'}l>i 

l<p<g<m 

^ 20E [hr )hr (X|P'^^ )hr , X[^«^ )], 

|{P:9}n{p',9'}l=0 

l<p<g<m 

l<p^<g^<m 

where the first sum is bounded by 26((™)[(™) — ("*^^)] = O(m^)|| 0 r ||2 because \hr \ < 1- The 
expectation 

equals 0 when {p, q} n {p', g'}| = 0, and Lemma rF.il can be invoked to show the second sum in 
(IF.18|) is bounded in absolute value by O(m^)||0T-|||. 

Having established the bounds for the terms (1) — (4) in (IF.151) . we find that when c = 2 the 
left hand side of (IF.121) is less than 

OK)n-^(l + || 0.||2 + || 0 .||i) 

(^zi_„) 2 -^zi_„|| 0 .||i + ||0.||4’ 

which, under ^ —)> 7 , can be made to be less than when || 0 r ||2 and n are large enough. □ 


Proof of the claim when c > 3. For c = 3 or c = 4, we may proceed similarly, using again the 
boundedness of hr and Lemma IFTI We note that if c = 3, then |/ij| < 0(m^)(l + || 0 r|| 2 ) and 
omit further details. □ 


Proof of Theorem 15.31 By iHan and Liul (120141) , under Hq , 
(F.19) 


sup 

tGR 


Pc 


^n(4C^'^) — 41ogTO + loglogm < — exp(—exp(—1/2)/\/8^) 


as ^ 
in 


7 is a special case of the regime logm = o(n^/^) considered 
Han and Liul ( 2014 )1. where = 1/9 as given in Table [T] and exp(—exp(—t/2)/v^) 


7 (recall that — 


is the distribution function of a Gumbel-distributed random variable G. Defining H(t) be 
the distribution function of the transformed random variable 4^^^ (G + 4 log m — log log m), 
(IF.191) is equivalent to 

(F.20) 


sup [PoiVnStf^^ <t) — H{t)\ 
tGR 


Hence, the critical value of the test (paiSlf^^) is calibrated by the (1 — a)-quantile of the 
distribution function H{-), and Hq is rejected if exceeds this value. 

Suppose, towards a contradiction, that the constant C indicated by the theorem exists, and 
let X be m-variate normal with = 9r = for some 7 ' < 7 and all 1 < p < <7 < m. 
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such that || 0 r ||2 > C as m,n —>• oo. By = P = sin ( Lindskog et all 2003li . the 

distribution of X belongs to the set of equicorrelation alternatives A/"^'^“'(|| 0 r ||2 > C), and we 
will use P(-) to denote the probability operator under this distribution. To finish the proof it 
suffices to show that 


(F.21) 


as — 


sup 

teR 

Since 


P f y/n 


n max 

l<p<q'<m 


C/M _ y/2C{'y'n)-^ < 0 


0 


•y/n max 

l<p<q'<m 


_ V2C{yn)-^ - 


< 


V2C 

^'y/n 


by reverse triangular inequality, if (IF.21|) is true, by uniform continuity of H{-) (as i^(•) is a 
continuous distribution function) and the fact that —> 0 , elementary arguments show 

that 

sup|P(V^|5r"l <t)- H{t)\ 0 

tGR 

as ^ —>■ 7 - As such, the asymptotic power of our test under this alternative, also 

equals a, leading to the desired contradiction since P > a. 

It remains to show (IF.21I) . Since the value of depends only on the rank vectors 

we can assume without loss of generality that the components of X have 
variance 1. Let F*^ and F be two (™) x matrices, whose components are indexed by 
((P: q), (p^ q')) tor 1 < p < q < m, 1 < p' < q' < m, and defined as 






)hri^l 


(p'l') j^ip'g') 


^)] and 






2C^ 


Here, again, E[-] is the expectation operator under the alternative distribution of X. We note 
that r° and F are in fact the covariance matrices of the Hajek projection of the vector - value d 
U-statistic U,- defined in the proof of Theorem 15.II Applying Theorem 2.1 from Chen ( 2016l l. 
we obtain that 


(F.22) 

(F.23) 


sup < i) — Fo{t)\ —0 and 

tGR 


sup 

tern. 


P(\/H inax 


V2CiYn)~^\<t)- F{t) 


0 


as — —> 7 , where Po(’) and F{-) are, respectively, the cumulative distribution functions of 
ll^olloo and Halloo for multivariate normal random vector Zq ^ (0, r°) and Z ^ (0, F). 

Now by Lemma IFTI for each pair {{p, q), {p', q')), 


l^(p>9),(p'>9') ^(p,9),(p',9') I ^ P F dr Oijl ), 

and hence by a comparison lemma in Chernozhukov et al.l ( 2013 . Lemma 3.1), 

(F.24) sup |Po(i) — -^^(01 ^ 0{n~^/^) (1 V log(0(TOn)))^^^ , 

(GR 


where the right hand side of (IF.241) converges to 0 as ^ —>■ 7 . Collecting (IF.201) . (IF.221) . (IF.231) 
and (IF.241) leads to (IF.211) . □ 
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Table 2 . Simulated size of tests when are i.i.d. ^3^2 data. For 

each combination of (m, n) and each test, the sizes are computed from 5000 
independently generated datasets. 


Statistics 

n\m 

4 

8 

16 

32 

64 

128 

256 

512 

Sr 


0.060 

0.065 

0.062 

0.067 

0.071 

0.063 

0.071 

0.072 

Sr 


0.069 

0.079 

0.080 

0.090 

0.094 

0.093 

0.086 

0.089 

Tr 


0.088 

0.096 

0.102 

0.113 

0.120 

0.110 

0.113 

0.114 

Sp, 

16 

0.046 

0.050 

0.052 

0.057 

0.059 

0.053 

0.053 

0.055 

Tp, 


0.079 

0.093 

0.099 

0.107 

0.111 

0.107 

0.104 

0.109 

St^ 


0.079 

0.098 

0.115 

0.112 

0.123 

0.122 

0.111 

0.121 

Zf 


0.079 

0.092 

0.098 

0.098 

0.111 

0.104 

0.096 

0.099 

Sr 


0.066 

0.078 

0.076 

0.081 

0.076 

0.089 

0.079 

0.086 

Sr 


0.059 

0.069 

0.067 

0.077 

0.073 

0.071 

0.070 

0.077 

Tr 


0.064 

0.078 

0.075 

0.087 

0.081 

0.082 

0.080 

0.086 

Sps 

32 

0.047 

0.054 

0.052 

0.061 

0.056 

0.053 

0.056 

0.058 

Tp. 


0.062 

0.075 

0.072 

0.082 

0.080 

0.079 

0.072 

0.083 

Sf 


0.056 

0.081 

0.085 

0.090 

0.088 

0.078 

0.087 

0.085 

Zt. 


0.062 

0.069 

0.067 

0.081 

0.077 

0.077 

0.079 

0.078 

Sr 


0.073 

0.083 

0.095 

0.095 

0.102 

0.097 

0.096 

0.091 

Sr 


0.057 

0.061 

0.062 

0.065 

0.058 

0.058 

0.065 

0.059 

Tr 


0.058 

0.064 

0.066 

0.069 

0.061 

0.064 

0.067 

0.062 

Sps 

64 

0.048 

0.053 

0.055 

0.055 

0.050 

0.052 

0.057 

0.048 

Tp. 


0.057 

0.061 

0.065 

0.067 

0.060 

0.064 

0.059 

0.062 

Sf 


0.045 

0.074 

0.064 

0.070 

0.068 

0.070 

0.069 

0.063 

Zt^ 


0.054 

0.061 

0.058 

0.064 

0.065 

0.062 

0.063 

0.064 

Sr 


0.072 

0.089 

0.107 

0.112 

0.101 

0.109 

0.110 

0.115 

Sr 


0.047 

0.061 

0.053 

0.061 

0.052 

0.056 

0.053 

0.055 

Tr 


0.049 

0.063 

0.053 

0.064 

0.054 

0.060 

0.054 

0.058 

Sp, 

128 

0.043 

0.059 

0.049 

0.056 

0.048 

0.052 

0.048 

0.051 

Tp. 


0.048 

0.062 

0.052 

0.060 

0.055 

0.057 

0.058 

0.054 

Sf 


0.041 

0.066 

0.070 

0.071 

0.060 

0.058 

0.052 

0.058 

Zt- 


0.050 

0.055 

0.058 

0.062 

0.053 

0.056 

0.055 

0.055 
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Table 3. Simulated power of tests when data are generated from the mul¬ 
tivariate normal (MVN), multivariate t (MVT) and multivariate power ex¬ 
ponential (MVPE) distributions with three different values for the depen¬ 
dency signal llOrlli- All pairwise (population) Kendall’s tau correlations 
9^‘^\ 1 < p < q < m are equal to the same value 0 so that || 0x111 = 

For each combination of (m, n) and each test, the power is calculated from 500 
independently generated datasets. 




110.111 = 

0.1 

110.111 = 

0.3 

110.111 = 

0.7 

Statistic 

n\m 

64 

128 

256 

64 

128 

256 

64 

128 

256 

MVN 

Sr 


0.094 

0.054 

0.070 

0.182 

0.108 

0.092 

0.424 

0.218 

0.114 

Tr 


0.100 

0.068 

0.078 

0.194 

0.110 

0.090 

0.426 

0.228 

0.134 

^max 

64 

0.046 

0.046 

0.020 

0.040 

0.058 

0.046 

0.056 

0.054 

0.058 



0.070 

0.058 

0.070 

0.178 

0.114 

0.080 

0.448 

0.222 

0.110 

Cai & Ma 


0.076 

0.076 

0.060 

0.190 

0.116 

0.086 

0.456 

0.278 

0.130 

5x 


0.130 

0.086 

0.056 

0.342 

0.164 

0.080 

0.794 

0.444 

0.176 

Tr 


0.132 

0.088 

0.058 

0.352 

0.174 

0.084 

0.806 

0.446 

0.186 

^max 

128 

0.062 

0.064 

0.052 

0.046 

0.058 

0.060 

0.094 

0.058 

0.060 


0.142 

0.072 

0.066 

0.378 

0.172 

0.084 

0.832 

0.514 

0.198 

LRT 


0.094 

- 

- 

0.204 

- 

- 

0.396 

- 

- 

Cai & Ma 


0.134 

0.064 

0.068 

0.386 

0.172 

0.096 

0.834 

0.520 

0.204 

5x 


0.256 

0.108 

0.096 

0.780 

0.358 

0.198 

0.992 

0.838 

0.476 

Tr 


0.262 

0.114 

0.094 

0.782 

0.364 

0.200 

0.992 

0.830 

0.470 

^max 

256 

0.048 

0.050 

0.046 

0.064 

0.056 

0.058 

0.124 

0.082 

0.052 


0.282 

0.126 

0.094 

0.816 

0.420 

0.224 

1.000 

0.880 

0.502 

LRT 


0.166 

0.086 

- 

0.450 

0.152 

- 

0.876 

0.370 

- 

Cai & Ma 


0.282 

0.124 

0.110 

0.812 

0.422 

0.234 

1.000 

0.882 

0.494 


MVT 


Sr 


0.506 

0.866 

0.998 

0.628 

0.896 

0.998 

0.802 

0.926 

0.998 

Tr 

64 

0.130 

0.080 

0.078 

0.232 

0.128 

0.096 

0.488 

0.234 

0.114 

^max 


0.080 

0.066 

0.060 

0.086 

0.074 

0.060 

0.110 

0.074 

0.068 

5x 


0.554 

0.912 

0.998 

0.806 

0.948 

1.000 

0.962 

0.990 

1.000 

Tr 

128 

0.130 

0.102 

0.094 

0.384 

0.210 

0.114 

0.796 

0.494 

0.244 

^max 


0.064 

0.060 

0.054 

0.080 

0.064 

0.066 

0.114 

0.074 

0.076 

Sr 


0.694 

0.924 

1.000 

0.972 

0.992 

1.000 

1.000 

1.000 

1.000 

Tr 

256 

0.268 

0.130 

0.084 

0.740 

0.348 

0.188 

0.998 

0.832 

0.456 

^max 


0.076 

0.062 

0.072 

0.110 

0.066 

0.076 

0.186 

0.102 

0.078 


MVPE 


Sr 


0.052 

0.042 

0.022 

0.128 

0.056 

0.044 

0.358 

0.122 

0.060 

Tr 

64 

0.114 

0.076 

0.076 

0.222 

0.110 

0.082 

0.462 

0.216 

0.134 

^max 


0.056 

0.050 

0.032 

0.046 

0.050 

0.034 

0.062 

0.054 

0.036 

Sr 


0.074 

0.038 

0.028 

0.274 

0.094 

0.036 

0.744 

0.314 

0.112 

Tr 

128 

0.128 

0.084 

0.056 

0.398 

0.174 

0.096 

0.836 

0.454 

0.214 

^max 


0.038 

0.054 

0.050 

0.050 

0.056 

0.044 

0.084 

0.060 

0.046 

Sr 


0.134 

0.066 

0.050 

0.638 

0.256 

0.102 

0.992 

0.794 

0.306 

Tr 

256 

0.232 

0.152 

0.100 

0.768 

0.370 

0.184 

0.998 

0.862 

0.450 

^max 


0.052 

0.036 

0.060 

0.074 

0.040 

0.060 

0.120 

0.064 

0.062 
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Table 4. Simulated power of tests when data are generated from multivari¬ 
ate normal (MVN), multivariate t (MVT) and multivariate power exponential 
(MVPE) distributions with three different values for the dependency signal 
||0r||2- For each distribution, the scatter matrix S = (aij) is a pentadiagonal 
matrix with I’s on the diagonal and equal values for the non-zero entries , 
1 < N - j| < 2. For each combination of (m, n) and each test, the power is 
calculated from 500 independently generated datasets. 




l|0.|li = 

0.1 

l|0.||^ = 

0.3 

110.11^ = 

0.7 

Statistic 

n\m 

64 

128 

256 

64 

128 

256 

64 

128 

256 

MVN 

Sr 


0.096 

0.070 

0.062 

0.170 

0.108 

0.084 

0.462 

0.210 

0.120 

Tr 


0.100 

0.080 

0.068 

0.176 

0.118 

0.090 

0.462 

0.224 

0.122 

^max 

64 

0.056 

0.062 

0.048 

0.064 

0.036 

0.048 

0.090 

0.050 

0.056 

^Sr 


0.068 

0.070 

0.062 

0.162 

0.092 

0.078 

0.478 

0.206 

0.116 

Cai & Ma 


0.086 

0.074 

0.060 

0.176 

0.104 

0.086 

0.500 

0.228 

0.132 



0.140 

0.078 

0.072 

0.390 

0.176 

0.104 

0.862 

0.434 

0.190 

Tr 


0.138 

0.084 

0.070 

0.398 

0.186 

0.100 

0.870 

0.438 

0.186 

^max 

128 

0.066 

0.038 

0.042 

0.078 

0.048 

0.044 

0.156 

0.092 

0.036 

^Sr 

0.126 

0.070 

0.064 

0.428 

0.178 

0.092 

0.914 

0.518 

0.180 

LRT 


0.126 

- 

- 

0.300 

- 

- 

0.776 

- 

- 

Cai & Ma 


0.118 

0.062 

0.066 

0.414 

0.180 

0.094 

0.928 

0.506 

0.198 

5. 


0.246 

0.120 

0.078 

0.818 

0.394 

0.168 

1.000 

0.906 

0.476 

Tr 


0.246 

0.120 

0.082 

0.808 

0.402 

0.164 

1.000 

0.908 

0.474 

^max 

256 

0.086 

0.038 

0.064 

0.136 

0.080 

0.064 

0.618 

0.136 

0.066 


0.268 

0.120 

0.090 

0.864 

0.430 

0.172 

1.000 

0.952 

0.510 

LRT 


0.220 

0.092 

- 

0.780 

0.314 

- 

1.000 

0.84 

- 

Cai & Ma 


0.258 

0.120 

0.094 

0.864 

0.420 

0.196 

1.000 

0.954 

0.522 


MVT 


Sr 


0.484 

0.870 

0.998 

0.634 

0.900 

0.998 

0.832 

0.946 

0.998 

Tr 

64 

0.116 

0.080 

0.072 

0.214 

0.128 

0.082 

0.440 

0.218 

0.112 

^max 


0.078 

0.070 

0.060 

0.092 

0.068 

0.066 

0.132 

0.086 

0.076 

Sr 


0.560 

0.912 

0.998 

0.830 

0.950 

0.998 

0.988 

0.992 

1.000 

Tr 

128 

0.124 

0.102 

0.086 

0.370 

0.180 

0.130 

0.884 

0.482 

0.242 

^max 


0.068 

0.062 

0.070 

0.102 

0.064 

0.076 

0.238 

0.102 

0.078 



0.712 

0.932 

1.000 

0.978 

0.988 

1.000 

1.000 

1.000 

1.000 

Tr 

256 

0.256 

0.134 

0.076 

0.804 

0.344 

0.170 

1.000 

0.892 

0.480 

^max 


0.094 

0.066 

0.096 

0.220 

0.092 

0.090 

0.638 

0.226 

0.132 


MVPE 


Sr 


0.054 

0.038 

0.026 

0.120 

0.062 

0.030 

0.324 

0.110 

0.048 

Tr 

64 

0.120 

0.074 

0.072 

0.204 

0.108 

0.094 

0.462 

0.212 

0.128 

^max 


0.056 

0.046 

0.034 

0.078 

0.046 

0.038 

0.094 

0.048 

0.038 

Sr 


0.060 

0.036 

0.028 

0.250 

0.082 

0.038 

0.822 

0.272 

0.092 

Tr 

128 

0.128 

0.082 

0.058 

0.386 

0.150 

0.086 

0.906 

0.446 

0.190 

^max 


0.034 

0.060 

0.050 

0.062 

0.058 

0.046 

0.168 

0.076 

0.050 

Sr 


0.122 

0.058 

0.026 

0.716 

0.226 

0.082 

1.000 

0.828 

0.268 

Tr 

256 

0.226 

0.126 

0.072 

0.842 

0.374 

0.144 

1.000 

0.910 

0.452 

^max 


0.058 

0.030 

0.056 

0.146 

0.044 

0.066 

0.578 

0.106 

0.076 
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Table 5. Simulated power when contaminating 5% of data generated from 
-/Vm (0, Sband 2 )) where Sband 2 = (o'ij) has diagonal entries an = 1 and off- 
diagonal entry <7^ = 0.1 if 1 < |* — j| < 2 and (Jy = 0 if — j\ > 3. For 
each combination of (m,n) and each test, the power is calculated from 500 
independently generated datasets. 


Statistic 

n\m 

4 

8 

16 

32 

64 

128 

Sr 


0.058 

0.058 

0.038 

0.072 

0.086 

0.092 

Sr 


0.074 

0.090 

0.094 

0.096 

0.116 

0.120 

Tr 


0.094 

0.108 

0.122 

0.108 

0.144 

0.146 

Sp. 

16 

0.034 

0.068 

0.056 

0.070 

0.076 

0.074 

Tp, 


0.088 

0.096 

0.118 

0.116 

0.136 

0.152 

St> 


0.078 

0.114 

0.114 

0.130 

0.150 

0.162 

Zt^ 


0.100 

0.112 

0.118 

0.096 

0.112 

0.138 

Sr 


0.072 

0.100 

0.078 

0.110 

0.106 

0.104 

Sr 


0.086 

0.112 

0.114 

0.130 

0.136 

0.126 

Tr 


0.090 

0.130 

0.128 

0.132 

0.150 

0.138 

Sp. 

32 

0.072 

0.098 

0.086 

0.110 

0.106 

0.096 

Tp. 


0.084 

0.126 

0.114 

0.138 

0.136 

0.128 

St> 


0.068 

0.114 

0.130 

0.122 

0.148 

0.112 

Zt^ 


0.088 

0.120 

0.130 

0.118 

0.146 

0.116 

Sr 


0.110 

0.156 

0.128 

0.158 

0.172 

0.182 

Sr 


0.134 

0.164 

0.176 

0.216 

0.222 

0.204 

Tr 


0.138 

0.176 

0.182 

0.220 

0.240 

0.202 

Sp. 

64 

0.114 

0.166 

0.152 

0.190 

0.190 

0.192 

Tp. 


0.134 

0.176 

0.180 

0.204 

0.228 

0.200 

Sf 


0.110 

0.168 

0.148 

0.184 

0.184 

0.168 

Zt* 


0.130 

0.170 

0.174 

0.192 

0.184 

0.190 

Sr 


0.224 

0.290 

0.332 

0.342 

0.384 

0.414 

Sr 


0.306 

0.390 

0.408 

0.436 

0.454 

0.484 

Tr 


0.308 

0.392 

0.418 

0.440 

0.462 

0.484 

Sp. 

128 

0.296 

0.376 

0.392 

0.418 

0.444 

0.470 

Tp. 


0.302 

0.398 

0.414 

0.434 

0.452 

0.424 

St- 


0.198 

0.292 

0.338 

0.356 

0.370 

0.412 

Zt- 


0.274 

0.336 

0.402 

0.388 

0.412 

0.414 



